Thanks Karl, But I want to know how to add these files, so that such warnings also not come and a smooth flow is executed.
Is there any way to do that? Thanks, Nikita On Wed, Dec 12, 2018 at 4:47 PM Karl Wright <[email protected]> wrote: > Hi Nikita, > > This is occurring because en_GB does not have a translations file. It's a > warning and the code falls back to using en_US. > > Karl > > > On Wed, Dec 12, 2018 at 4:39 AM Nikita Ahuja <[email protected]> wrote: > >> Hi Karl, >> >> Thanks for the suggestion and Language for the data and content is able >> to detect now. But there is one issue while ingesting the records in the >> ElasticSearch Index. and it is stored there in the log file as: >> >> ERROR 2018-12-11T19:19:37,637 (qtp348148678-561) - Missing resource >> bundle 'org.apache.manifoldcf.ui.i18n.common' for locale 'en_GB': Can't >> find bundle for base name org.apache.manifoldcf.ui.i18n.common, locale >> en_GB; trying en >> java.util.MissingResourceException: Can't find bundle for base name >> org.apache.manifoldcf.ui.i18n.common, locale en_GB >> at >> java.base/java.util.ResourceBundle.throwMissingResourceException(Unknown >> Source) ~[?:?] >> at java.base/java.util.ResourceBundle.getBundleImpl(Unknown Source) >> ~[?:?] >> at java.base/java.util.ResourceBundle.getBundleImpl(Unknown Source) >> ~[?:?] >> at java.base/java.util.ResourceBundle.getBundle(Unknown Source) ~[?:?] >> at >> org.apache.manifoldcf.core.i18n.Messages.getResourceBundle(Messages.java:132) >> [mcf-core.jar:?] >> at >> org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:178) >> [mcf-core.jar:?] >> at >> org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:216) >> [mcf-core.jar:?] >> at >> org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:343) >> [mcf-ui-core.jar:?] >> at >> org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:119) >> [mcf-ui-core.jar:?] >> at >> org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:67) >> [mcf-ui-core.jar:?] >> at org.apache.jsp.index_jsp._jspService(index_jsp.java:212) [jsp/:?] >> >> >> Is this can be resolved after adding any resource files or any other >> solution has to be opted? >> >> On Wed, Nov 21, 2018 at 5:36 PM Karl Wright <[email protected]> wrote: >> >>> Hi Nikita, >>> >>> The Tika transformer may well generate a language attribute. You would >>> need to check with Tika, though, to know for sure, and under what >>> conditions it might generate this. It should not be confused with document >>> format detection, which Tika definitely does in order to extract content. >>> >>> It looks like language detection in Tika either comes from document >>> metadata already present, or via a Java interface that you need to >>> explicitly call to get it. If your documents need the latter, the Tika >>> connector does not currently do this: >>> >>> https://tika.apache.org/1.19.1/detection.html#Language_Detection >>> >>> and >>> >>> https://tika.apache.org/1.19.1/examples.html#Language_Identification >>> >>> The documentation does not clarify whether a language attribute is >>> actually generated; the architecture seems more suited to plug in machine >>> translators for your content. I suspect you would need to run the output >>> of the Tika translator into the NullOutputConnector in order to see what >>> attributes are being generated to know for sure. >>> >>> Karl >>> >>> >>> On Wed, Nov 21, 2018 at 4:45 AM Nikita Ahuja <[email protected]> >>> wrote: >>> >>>> HI All, >>>> >>>> Thanks for the timely replies. But I am basically concerned for the >>>> language detection of the .doc,.pdf or any other data present in the >>>> repository. >>>> >>>> As per my understanding Tika Transformation provides functionality for >>>> the same. >>>> But there is no output for the language of the documents. >>>> >>>> The sequence used is: >>>> 1. Repoistory Connector(Web) >>>> 2. Tika Transformation >>>> 3. MetaData Adjuster >>>> 4.Output Connector(Elastic) >>>> >>>> Is there anything which is being missed here for the language detection >>>> of the documents? >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Nov 21, 2018 at 2:35 PM Furkan KAMACI <[email protected]> >>>> wrote: >>>> >>>>> Hi Nikita, >>>>> >>>>> First of all, OpenNLP is a transformation connector at ManifoldCF and >>>>> should be enabled by default. It extracts named entities (people, >>>>> locations >>>>> and organizations) from document. >>>>> >>>>> You should download trained models to run OpenNLP connector. You can >>>>> check here for such purpose: https://opennlp.apache.org/models.html >>>>> >>>>> Check here for a detailed explanation: >>>>> https://github.com/ChalithaUdara/OpenNLP-Manifold-Connector >>>>> >>>>> Feel free to ask any questions when you try to integrate it. Also, you >>>>> should explain the points if you cannot success to run it. >>>>> >>>>> Kind Regards, >>>>> Furkan KAMACI >>>>> >>>>> >>>>> On Wed, Nov 21, 2018 at 11:54 AM Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Nikita, >>>>>> >>>>>> Can you be more specific when you say "OpenNLP is not working"? All >>>>>> that this connector does is integrate OpenNLP as a ManifoldCF >>>>>> transformer. >>>>>> It uses a specific directory to deliver the models that OpenNLP uses to >>>>>> match and extract content from documents. Thus, you can provide any >>>>>> models >>>>>> you want that are compatible with the OpenNLP version we're including. >>>>>> >>>>>> Can you describe the steps you are taking and what you are seeing? >>>>>> >>>>>> On Wed, Nov 21, 2018 at 12:44 AM Nikita Ahuja <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have query related to detect the language of the records/data >>>>>>> which is going to be ingest in the Output Connector. >>>>>>> >>>>>>> OpenNLP connector is not working for the detection as per the user >>>>>>> documentation, but this is not working appropriately. Please suggest is >>>>>>> NLP >>>>>>> has to be used if yes, then how it should be used or is there any other >>>>>>> solution for this? >>>>>>> >>>>>>> -- >>>>>>> Thanks and Regards, >>>>>>> Nikita >>>>>>> Email: [email protected] >>>>>>> United Sources Service Pvt. Ltd. >>>>>>> a "Smartshore" Company >>>>>>> Mobile: +91 99 888 57720 >>>>>>> http://www.smartshore.nl >>>>>>> >>>>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Nikita >>>> Email: [email protected] >>>> United Sources Service Pvt. Ltd. >>>> a "Smartshore" Company >>>> Mobile: +91 99 888 57720 >>>> http://www.smartshore.nl >>>> >>> >> >> -- >> Thanks and Regards, >> Nikita >> Email: [email protected] >> United Sources Service Pvt. Ltd. >> a "Smartshore" Company >> Mobile: +91 99 888 57720 >> http://www.smartshore.nl >> > -- Thanks and Regards, Nikita Email: [email protected] United Sources Service Pvt. Ltd. a "Smartshore" Company Mobile: +91 99 888 57720 http://www.smartshore.nl
