If you are dealing with file data purely as byte arrays and copying them
from one place to another, you need not worry about the language or charset
since the bytes are preserved.

If you are converting them to Strings explicitly or using classes that might
do so implicitly, you need to specify an appropriate *CharSet* for the
conversion.

The *InputStreamReader* has a constructor that takes a *CharSet* or a
charset name
as a string.

Standard CharSet objects are available as static fields of the
*StandardCharsets* class:
https://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html

So, *if you know for sure that your input file is encoded in UTF8*, you can
create a *BufferedReader* that wraps an *InputStreamReader* as shown below
(very
similar to the code in *LineByLineFileInputOperator* in Apex Malhar):

------------------------------------------------------------------------------
*import java.nio.charset.StandardCharsets;*
*...*

*  protected transient BufferedReader br;*

*  protected InputStream openFile(Path path) throws IOException*
*  {*
*    InputStream is = super.openFile(path);*
*    br = new BufferedReader(new InputStreamReader(is,
StandardCharsets.UTF_8));*
*    return is;*
*  }*

*  @Override*
*  protected void closeFile(InputStream is) throws IOException*
*  {*
*    super.closeFile(is);*
*    br.close();*
*    br = null;*
*  }*

*  @Override*
*  protected String readEntity() throws IOException*
*  {*
*    return br.readLine();*
*  }*
--------------------------------------------------------------------

Ram

On Tue, Aug 9, 2016 at 8:29 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
[email protected]> wrote:

> Hi,
>
> I have files on HDFS with French characters that I need to write to
> another file on HDFS. I am using AbstractFileInputOperator.java which has
> the following method that can stream the input file. Can you please suggest
> how would I handle the French characters ? (I suppose I should pass the
> character encoding UTF8 to generate the inputstream but not sure how would
> I achieve that).
>
> ###############method from AbstractFileInputOperator.
> java####################
>
> *protected* InputStream openFile(Path path) *throws* IOException
>   {
>     currentFile = path.toString();
>     offset = 0;
>     retryCount = 0;
>     skipCount = 0;
>     *LOG*.info("opening file {}", path);
>     InputStream input = fs.open(path);
>     *return* input;
>   }
>
> Regards,
> Surya Vamshi
>
>
> _______________________________________________________________________
>
> If you received this email in error, please advise the sender (by return
> email or otherwise) immediately. You have consented to receive the attached
> electronically at the above-noted email address; please retain a copy of
> this confirmation for future reference.
>
> Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur
> immédiatement, par retour de courriel ou par un autre moyen. Vous avez
> accepté de recevoir le(s) document(s) ci-joint(s) par voie électronique à
> l'adresse courriel indiquée ci-dessus; veuillez conserver une copie de
> cette confirmation pour les fins de reference future.
>
>

Reply via email to