Look in the manifoldcf source tree for files named "common_en_US.properties". For every one of these you will need to create a similar file for your specific locale.
Thanks, Karl On Tue, Dec 18, 2018 at 2:07 AM Nikita Ahuja <[email protected]> wrote: > Thanks Karl, > > But I want to know how to add these files, so that such warnings also not > come and a smooth flow is executed. > > Is there any way to do that? > > Thanks, > Nikita > > On Wed, Dec 12, 2018 at 4:47 PM Karl Wright <[email protected]> wrote: > >> Hi Nikita, >> >> This is occurring because en_GB does not have a translations file. It's >> a warning and the code falls back to using en_US. >> >> Karl >> >> >> On Wed, Dec 12, 2018 at 4:39 AM Nikita Ahuja <[email protected]> >> wrote: >> >>> Hi Karl, >>> >>> Thanks for the suggestion and Language for the data and content is able >>> to detect now. But there is one issue while ingesting the records in the >>> ElasticSearch Index. and it is stored there in the log file as: >>> >>> ERROR 2018-12-11T19:19:37,637 (qtp348148678-561) - Missing resource >>> bundle 'org.apache.manifoldcf.ui.i18n.common' for locale 'en_GB': Can't >>> find bundle for base name org.apache.manifoldcf.ui.i18n.common, locale >>> en_GB; trying en >>> java.util.MissingResourceException: Can't find bundle for base name >>> org.apache.manifoldcf.ui.i18n.common, locale en_GB >>> at >>> java.base/java.util.ResourceBundle.throwMissingResourceException(Unknown >>> Source) ~[?:?] >>> at java.base/java.util.ResourceBundle.getBundleImpl(Unknown Source) >>> ~[?:?] >>> at java.base/java.util.ResourceBundle.getBundleImpl(Unknown Source) >>> ~[?:?] >>> at java.base/java.util.ResourceBundle.getBundle(Unknown Source) >>> ~[?:?] >>> at >>> org.apache.manifoldcf.core.i18n.Messages.getResourceBundle(Messages.java:132) >>> [mcf-core.jar:?] >>> at >>> org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:178) >>> [mcf-core.jar:?] >>> at >>> org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:216) >>> [mcf-core.jar:?] >>> at >>> org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:343) >>> [mcf-ui-core.jar:?] >>> at >>> org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:119) >>> [mcf-ui-core.jar:?] >>> at >>> org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:67) >>> [mcf-ui-core.jar:?] >>> at org.apache.jsp.index_jsp._jspService(index_jsp.java:212) [jsp/:?] >>> >>> >>> Is this can be resolved after adding any resource files or any other >>> solution has to be opted? >>> >>> On Wed, Nov 21, 2018 at 5:36 PM Karl Wright <[email protected]> wrote: >>> >>>> Hi Nikita, >>>> >>>> The Tika transformer may well generate a language attribute. You would >>>> need to check with Tika, though, to know for sure, and under what >>>> conditions it might generate this. It should not be confused with document >>>> format detection, which Tika definitely does in order to extract content. >>>> >>>> It looks like language detection in Tika either comes from document >>>> metadata already present, or via a Java interface that you need to >>>> explicitly call to get it. If your documents need the latter, the Tika >>>> connector does not currently do this: >>>> >>>> https://tika.apache.org/1.19.1/detection.html#Language_Detection >>>> >>>> and >>>> >>>> https://tika.apache.org/1.19.1/examples.html#Language_Identification >>>> >>>> The documentation does not clarify whether a language attribute is >>>> actually generated; the architecture seems more suited to plug in machine >>>> translators for your content. I suspect you would need to run the output >>>> of the Tika translator into the NullOutputConnector in order to see what >>>> attributes are being generated to know for sure. >>>> >>>> Karl >>>> >>>> >>>> On Wed, Nov 21, 2018 at 4:45 AM Nikita Ahuja <[email protected]> >>>> wrote: >>>> >>>>> HI All, >>>>> >>>>> Thanks for the timely replies. But I am basically concerned for the >>>>> language detection of the .doc,.pdf or any other data present in the >>>>> repository. >>>>> >>>>> As per my understanding Tika Transformation provides functionality for >>>>> the same. >>>>> But there is no output for the language of the documents. >>>>> >>>>> The sequence used is: >>>>> 1. Repoistory Connector(Web) >>>>> 2. Tika Transformation >>>>> 3. MetaData Adjuster >>>>> 4.Output Connector(Elastic) >>>>> >>>>> Is there anything which is being missed here for the language >>>>> detection of the documents? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Nov 21, 2018 at 2:35 PM Furkan KAMACI <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Nikita, >>>>>> >>>>>> First of all, OpenNLP is a transformation connector at ManifoldCF and >>>>>> should be enabled by default. It extracts named entities (people, >>>>>> locations >>>>>> and organizations) from document. >>>>>> >>>>>> You should download trained models to run OpenNLP connector. You can >>>>>> check here for such purpose: https://opennlp.apache.org/models.html >>>>>> >>>>>> Check here for a detailed explanation: >>>>>> https://github.com/ChalithaUdara/OpenNLP-Manifold-Connector >>>>>> >>>>>> Feel free to ask any questions when you try to integrate it. Also, >>>>>> you should explain the points if you cannot success to run it. >>>>>> >>>>>> Kind Regards, >>>>>> Furkan KAMACI >>>>>> >>>>>> >>>>>> On Wed, Nov 21, 2018 at 11:54 AM Karl Wright <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Nikita, >>>>>>> >>>>>>> Can you be more specific when you say "OpenNLP is not working"? All >>>>>>> that this connector does is integrate OpenNLP as a ManifoldCF >>>>>>> transformer. >>>>>>> It uses a specific directory to deliver the models that OpenNLP uses to >>>>>>> match and extract content from documents. Thus, you can provide any >>>>>>> models >>>>>>> you want that are compatible with the OpenNLP version we're including. >>>>>>> >>>>>>> Can you describe the steps you are taking and what you are seeing? >>>>>>> >>>>>>> On Wed, Nov 21, 2018 at 12:44 AM Nikita Ahuja <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have query related to detect the language of the records/data >>>>>>>> which is going to be ingest in the Output Connector. >>>>>>>> >>>>>>>> OpenNLP connector is not working for the detection as per the user >>>>>>>> documentation, but this is not working appropriately. Please suggest >>>>>>>> is NLP >>>>>>>> has to be used if yes, then how it should be used or is there any other >>>>>>>> solution for this? >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks and Regards, >>>>>>>> Nikita >>>>>>>> Email: [email protected] >>>>>>>> United Sources Service Pvt. Ltd. >>>>>>>> a "Smartshore" Company >>>>>>>> Mobile: +91 99 888 57720 >>>>>>>> http://www.smartshore.nl >>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Nikita >>>>> Email: [email protected] >>>>> United Sources Service Pvt. Ltd. >>>>> a "Smartshore" Company >>>>> Mobile: +91 99 888 57720 >>>>> http://www.smartshore.nl >>>>> >>>> >>> >>> -- >>> Thanks and Regards, >>> Nikita >>> Email: [email protected] >>> United Sources Service Pvt. Ltd. >>> a "Smartshore" Company >>> Mobile: +91 99 888 57720 >>> http://www.smartshore.nl >>> >> > > -- > Thanks and Regards, > Nikita > Email: [email protected] > United Sources Service Pvt. Ltd. > a "Smartshore" Company > Mobile: +91 99 888 57720 > http://www.smartshore.nl >
