Hi Nikita, This is occurring because en_GB does not have a translations file. It's a warning and the code falls back to using en_US.
Karl On Wed, Dec 12, 2018 at 4:39 AM Nikita Ahuja <[email protected]> wrote: > Hi Karl, > > Thanks for the suggestion and Language for the data and content is able to > detect now. But there is one issue while ingesting the records in the > ElasticSearch Index. and it is stored there in the log file as: > > ERROR 2018-12-11T19:19:37,637 (qtp348148678-561) - Missing resource bundle > 'org.apache.manifoldcf.ui.i18n.common' for locale 'en_GB': Can't find > bundle for base name org.apache.manifoldcf.ui.i18n.common, locale en_GB; > trying en > java.util.MissingResourceException: Can't find bundle for base name > org.apache.manifoldcf.ui.i18n.common, locale en_GB > at > java.base/java.util.ResourceBundle.throwMissingResourceException(Unknown > Source) ~[?:?] > at java.base/java.util.ResourceBundle.getBundleImpl(Unknown Source) > ~[?:?] > at java.base/java.util.ResourceBundle.getBundleImpl(Unknown Source) > ~[?:?] > at java.base/java.util.ResourceBundle.getBundle(Unknown Source) ~[?:?] > at > org.apache.manifoldcf.core.i18n.Messages.getResourceBundle(Messages.java:132) > [mcf-core.jar:?] > at > org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:178) > [mcf-core.jar:?] > at > org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:216) > [mcf-core.jar:?] > at > org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:343) > [mcf-ui-core.jar:?] > at > org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:119) > [mcf-ui-core.jar:?] > at > org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:67) > [mcf-ui-core.jar:?] > at org.apache.jsp.index_jsp._jspService(index_jsp.java:212) [jsp/:?] > > > Is this can be resolved after adding any resource files or any other > solution has to be opted? > > On Wed, Nov 21, 2018 at 5:36 PM Karl Wright <[email protected]> wrote: > >> Hi Nikita, >> >> The Tika transformer may well generate a language attribute. You would >> need to check with Tika, though, to know for sure, and under what >> conditions it might generate this. It should not be confused with document >> format detection, which Tika definitely does in order to extract content. >> >> It looks like language detection in Tika either comes from document >> metadata already present, or via a Java interface that you need to >> explicitly call to get it. If your documents need the latter, the Tika >> connector does not currently do this: >> >> https://tika.apache.org/1.19.1/detection.html#Language_Detection >> >> and >> >> https://tika.apache.org/1.19.1/examples.html#Language_Identification >> >> The documentation does not clarify whether a language attribute is >> actually generated; the architecture seems more suited to plug in machine >> translators for your content. I suspect you would need to run the output >> of the Tika translator into the NullOutputConnector in order to see what >> attributes are being generated to know for sure. >> >> Karl >> >> >> On Wed, Nov 21, 2018 at 4:45 AM Nikita Ahuja <[email protected]> >> wrote: >> >>> HI All, >>> >>> Thanks for the timely replies. But I am basically concerned for the >>> language detection of the .doc,.pdf or any other data present in the >>> repository. >>> >>> As per my understanding Tika Transformation provides functionality for >>> the same. >>> But there is no output for the language of the documents. >>> >>> The sequence used is: >>> 1. Repoistory Connector(Web) >>> 2. Tika Transformation >>> 3. MetaData Adjuster >>> 4.Output Connector(Elastic) >>> >>> Is there anything which is being missed here for the language detection >>> of the documents? >>> >>> >>> >>> >>> >>> On Wed, Nov 21, 2018 at 2:35 PM Furkan KAMACI <[email protected]> >>> wrote: >>> >>>> Hi Nikita, >>>> >>>> First of all, OpenNLP is a transformation connector at ManifoldCF and >>>> should be enabled by default. It extracts named entities (people, locations >>>> and organizations) from document. >>>> >>>> You should download trained models to run OpenNLP connector. You can >>>> check here for such purpose: https://opennlp.apache.org/models.html >>>> >>>> Check here for a detailed explanation: >>>> https://github.com/ChalithaUdara/OpenNLP-Manifold-Connector >>>> >>>> Feel free to ask any questions when you try to integrate it. Also, you >>>> should explain the points if you cannot success to run it. >>>> >>>> Kind Regards, >>>> Furkan KAMACI >>>> >>>> >>>> On Wed, Nov 21, 2018 at 11:54 AM Karl Wright <[email protected]> >>>> wrote: >>>> >>>>> Hi Nikita, >>>>> >>>>> Can you be more specific when you say "OpenNLP is not working"? All >>>>> that this connector does is integrate OpenNLP as a ManifoldCF transformer. >>>>> It uses a specific directory to deliver the models that OpenNLP uses to >>>>> match and extract content from documents. Thus, you can provide any >>>>> models >>>>> you want that are compatible with the OpenNLP version we're including. >>>>> >>>>> Can you describe the steps you are taking and what you are seeing? >>>>> >>>>> On Wed, Nov 21, 2018 at 12:44 AM Nikita Ahuja <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have query related to detect the language of the records/data which >>>>>> is going to be ingest in the Output Connector. >>>>>> >>>>>> OpenNLP connector is not working for the detection as per the user >>>>>> documentation, but this is not working appropriately. Please suggest is >>>>>> NLP >>>>>> has to be used if yes, then how it should be used or is there any other >>>>>> solution for this? >>>>>> >>>>>> -- >>>>>> Thanks and Regards, >>>>>> Nikita >>>>>> Email: [email protected] >>>>>> United Sources Service Pvt. Ltd. >>>>>> a "Smartshore" Company >>>>>> Mobile: +91 99 888 57720 >>>>>> http://www.smartshore.nl >>>>>> >>>>> >>> >>> -- >>> Thanks and Regards, >>> Nikita >>> Email: [email protected] >>> United Sources Service Pvt. Ltd. >>> a "Smartshore" Company >>> Mobile: +91 99 888 57720 >>> http://www.smartshore.nl >>> >> > > -- > Thanks and Regards, > Nikita > Email: [email protected] > United Sources Service Pvt. Ltd. > a "Smartshore" Company > Mobile: +91 99 888 57720 > http://www.smartshore.nl >
