[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363422#comment-16363422 ] Irindu Nugawela commented on CONNECTORS-1490: - Hi, Thanks for the quick reply, What activities do we need to support in addition to document ingestion and document deletion, And also I've seen that MongoDB is a NoSQL type of document-oriented database programme also I've seen few other implementations of OutputConnector interface such as KafKaConnector and GTSConnector can I use any of those implementations as a reference for my implementation or does MongoDB has no relevance to all those implementations other than extending the similar interface, can you please suggest a good reference implementation that would make me understand the task quickly if there is any. > GSOC: MongoDB Output Connector > -- > > Key: CONNECTORS-1490 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1490 > Project: ManifoldCF > Issue Type: New Feature > Components: MongoDB Output Connector >Reporter: Piergiorgio Lucidi >Assignee: Piergiorgio Lucidi >Priority: Major > Labels: MongoDB, gsoc2018, java, junit > Original Estimate: 480h > Remaining Estimate: 480h > > This is a project idea for [Google Summer of > Code|https://summerofcode.withgoogle.com/] (GSOC). > To discuss this or other ideas with your potential mentor from the Apache > ManifoldCF project, sign up and post to the dev@manifoldcf.apache.org list, > including "[GSOC]" in the subject. You may also comment on this Jira issue if > you have created an account. > We would like to extend the Content Migration capabilities adding MongoDB / > GridFS as a new output connector for importing contents from one or more > repositories supported by ManifoldCF. In this way we will help developers on > migrating contents from different data sources on MongoDB. > You will be involved in the development of the following tasks, you will > learn how to: > * Write the connector implementation > * Implement unit tests > * Build all the integration tests for testing the connector inside the > framework > * Write the documentation for this connector > We have a complete documentation on how to implement an Output Connector: > [https://manifoldcf.apache.org/release/release-2.9.1/en_US/writing-output-connectors.html] > Take a look also at our book to understand better the framework and how to > implement connectors: > [https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs] > > Prospective GSOC mentor: > [piergior...@apache.org|mailto:piergior...@apache.org] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362991#comment-16362991 ] Shinichiro Abe commented on CONNECTORS-1490: Hi, You would to need to impl mongo output connector extending [OutputConnector|https://github.com/apache/manifoldcf/blob/trunk/framework/agents/src/main/java/org/apache/manifoldcf/agents/output/BaseOutputConnector.java]. > GSOC: MongoDB Output Connector > -- > > Key: CONNECTORS-1490 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1490 > Project: ManifoldCF > Issue Type: New Feature > Components: MongoDB Output Connector >Reporter: Piergiorgio Lucidi >Assignee: Piergiorgio Lucidi >Priority: Major > Labels: MongoDB, gsoc2018, java, junit > Original Estimate: 480h > Remaining Estimate: 480h > > This is a project idea for [Google Summer of > Code|https://summerofcode.withgoogle.com/] (GSOC). > To discuss this or other ideas with your potential mentor from the Apache > ManifoldCF project, sign up and post to the dev@manifoldcf.apache.org list, > including "[GSOC]" in the subject. You may also comment on this Jira issue if > you have created an account. > We would like to extend the Content Migration capabilities adding MongoDB / > GridFS as a new output connector for importing contents from one or more > repositories supported by ManifoldCF. In this way we will help developers on > migrating contents from different data sources on MongoDB. > You will be involved in the development of the following tasks, you will > learn how to: > * Write the connector implementation > * Implement unit tests > * Build all the integration tests for testing the connector inside the > framework > * Write the documentation for this connector > We have a complete documentation on how to implement an Output Connector: > [https://manifoldcf.apache.org/release/release-2.9.1/en_US/writing-output-connectors.html] > Take a look also at our book to understand better the framework and how to > implement connectors: > [https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs] > > Prospective GSOC mentor: > [piergior...@apache.org|mailto:piergior...@apache.org] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362502#comment-16362502 ] Irindu Nugawela commented on CONNECTORS-1490: - Hi, Piergiorgio Lucidi, I went through your presentation and Shinichiro Abe's presentation, from that I figured that there are 3 types of connectors 1.Repository Connectors 2.Authority Connectors 3.Output Connectors. and that I have to write a repository connector for MongoDB please correct me If I am wrong > GSOC: MongoDB Output Connector > -- > > Key: CONNECTORS-1490 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1490 > Project: ManifoldCF > Issue Type: New Feature > Components: MongoDB Output Connector >Reporter: Piergiorgio Lucidi >Assignee: Piergiorgio Lucidi >Priority: Major > Labels: MongoDB, gsoc2018, java, junit > Original Estimate: 480h > Remaining Estimate: 480h > > This is a project idea for [Google Summer of > Code|https://summerofcode.withgoogle.com/] (GSOC). > To discuss this or other ideas with your potential mentor from the Apache > ManifoldCF project, sign up and post to the dev@manifoldcf.apache.org list, > including "[GSOC]" in the subject. You may also comment on this Jira issue if > you have created an account. > We would like to extend the Content Migration capabilities adding MongoDB / > GridFS as a new output connector for importing contents from one or more > repositories supported by ManifoldCF. In this way we will help developers on > migrating contents from different data sources on MongoDB. > You will be involved in the development of the following tasks, you will > learn how to: > * Write the connector implementation > * Implement unit tests > * Build all the integration tests for testing the connector inside the > framework > * Write the documentation for this connector > We have a complete documentation on how to implement an Output Connector: > [https://manifoldcf.apache.org/release/release-2.9.1/en_US/writing-output-connectors.html] > Take a look also at our book to understand better the framework and how to > implement connectors: > [https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs] > > Prospective GSOC mentor: > [piergior...@apache.org|mailto:piergior...@apache.org] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Google Summer of Code 2018 is coming
Thanks, Piergiorgio! I too am way too overcommitted to mentor this year, so I'm glad you are doing it. Thanks again! Karl On Tue, Feb 13, 2018 at 8:20 AM, Piergiorgio Lucidiwrote: > Thank you Rafa and it seems that we have a guy that would like to implement > the new MongoDB Output Connector: > https://issues.apache.org/jira/browse/CONNECTORS-1490 > > I will help him to start the development. > > PJ > > 2018-01-30 23:59 GMT+01:00 Rafa Haro : > > > Hi Piergiorgio, all, > > > > I have been super busy lately (last few months) so I'm a little bit > > disconnected at this moment. Actually I still have in my TODO list > > finishing the nuxeo connector. Anyway, in a month or so I hope to be a > > little bit less occupied so I could be helping at least as co-mentor. > > > > Just raising half of a hand here :-) > > El El mar, 30 ene 2018 a las 8:44, Piergiorgio Lucidi < > > piergior...@apache.org> escribió: > > > > > Ok guys, > > > > > > this is the last day, if it is ok for you I can add all these ideas or > > just > > > a few. > > > > > > Please share your comments about this. > > > This is a great opportunity to involve other people. > > > > > > Thank you. > > > > > > Cheers, > > > PJ > > > > > > 2018-01-24 20:21 GMT+01:00 Piergiorgio Lucidi >: > > > > > > > Some ideas from different projects are coming in JIRA: > > > > https://issues.apache.org/jira/issues/?filter=12343065 > > > > > > > > PJ > > > > > > > > 2018-01-24 11:26 GMT+01:00 Piergiorgio Lucidi < > piergior...@apache.org > > >: > > > > > > > >> Hi, > > > >> > > > >> Google Summer Of Code 2018 is starting and we are going to confirm > the > > > >> Apache participation also this year. > > > >> We could join the program as usual creating some improvements / > tasks > > > for > > > >> ManifoldCF with a specific JIRA tag "gsoc2018". > > > >> > > > >> Do you have any ideas to include in our proposals? > > > >> I can just start the discussion with some ideas below: > > > >> > > > >> 1. Start our DevOps adoption: some ready-to-run ManifoldCF Docker > > images > > > >> on different stack: MySQL, Postgres, etc... > > > >> 2. Improve our Cloud services storage adoption: add support for > Azure > > > and > > > >> Amazon storage for both Repository and Output Connectors > > > >> 3. Continue our work on Content Migration: making sure that all the > > > >> repository connectors correctly work with the existent output > > connectors > > > >> 4. MongoDB Output Connector > > > >> 5. A brand new website template: a porting based on Jekyll? (maybe > is > > > not > > > >> so interesting for a student... but anyway we should do that :-P) > > > >> > > > >> Please feel free to add any comment about this. > > > >> Thank you. > > > >> > > > >> Cheers, > > > >> PJ > > > >> > > > >> -- Forwarded message -- > > > >> From: Ulrich Stärk > > > >> Date: 2018-01-21 22:22 GMT+01:00 > > > >> Subject: Google Summer of Code 2018 is coming > > > >> To: ment...@community.apache.org > > > >> > > > >> > > > >> Hello PMCs (incubator Mentors, please forward this email to your > > > >> podlings), > > > >> > > > >> Google Summer of Code [1] is a program sponsored by Google allowing > > > >> students to spend their summer > > > >> working on open source software. Students will receive stipends for > > > >> developing open source software > > > >> full-time for three months. Projects will provide mentoring and > > project > > > >> ideas, and in return have > > > >> the chance to get new code developed and - most importantly - to > > > identify > > > >> and bring in new committers. > > > >> > > > >> The ASF will apply as a participating organization meaning > individual > > > >> projects don't have to apply > > > >> separately. > > > >> > > > >> If you want to participate with your project we ask you to do the > > > >> following things as soon as > > > >> possible but please no later than 2017-01-30: > > > >> > > > >> 1. understand what it means to be a mentor [2]. > > > >> > > > >> 2. record your project ideas. > > > >> > > > >> Just create issues in JIRA, label them with gsoc2018, and they will > > show > > > >> up at [3]. Please be as > > > >> specific as possible when describing your idea. Include the > > programming > > > >> language, the tools and > > > >> skills required, but try not to scare potential students away. They > > are > > > >> supposed to learn what's > > > >> required before the program starts. > > > >> > > > >> Use labels, e.g. for the programming language (java, c, c++, erlang, > > > >> python, brainfuck, ...) or > > > >> technology area (cloud, xml, web, foo, bar, ...) and record them at > > [5]. > > > >> > > > >> Please use the COMDEV JIRA project for recording your ideas if your > > > >> project doesn't use JIRA (e.g. > > > >> httpd, ooo). Contact d...@community.apache.org if you need > assistance. > > > >> > > > >> [4] contains some additional information (will be
[jira] [Commented] (CONNECTORS-1494) Error crawling file system with file names having special characters.
[ https://issues.apache.org/jira/browse/CONNECTORS-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362308#comment-16362308 ] Karl Wright commented on CONNECTORS-1494: - Thanks for the update! We'll recommend that action the next time somebody trips over that problem. > Error crawling file system with file names having special characters. > - > > Key: CONNECTORS-1494 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1494 > Project: ManifoldCF > Issue Type: Bug > Components: File system connector >Affects Versions: ManifoldCF 2.9.1 >Reporter: Vinay >Assignee: Karl Wright >Priority: Critical > Fix For: ManifoldCF 2.10 > > > I am crawling a file system mounted on linux machine. So the Repository > Connection is of type "File System". For some files which has some special > characters, Manifold Cf is not picking such files. > File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf > exception: java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > ~[?:1.8.0_151] > at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151] > at java.lang.Long.(Long.java:965) ~[?:1.8.0_151] > at > org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513) > ~[?:?] > at > org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76) > ~[?:?] > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503) > ~[mcf-agents.jar:?] > at > org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) > [mcf-pull-agent.jar:?] > FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input > string: "" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Google Summer of Code 2018 is coming
Thank you Rafa and it seems that we have a guy that would like to implement the new MongoDB Output Connector: https://issues.apache.org/jira/browse/CONNECTORS-1490 I will help him to start the development. PJ 2018-01-30 23:59 GMT+01:00 Rafa Haro: > Hi Piergiorgio, all, > > I have been super busy lately (last few months) so I'm a little bit > disconnected at this moment. Actually I still have in my TODO list > finishing the nuxeo connector. Anyway, in a month or so I hope to be a > little bit less occupied so I could be helping at least as co-mentor. > > Just raising half of a hand here :-) > El El mar, 30 ene 2018 a las 8:44, Piergiorgio Lucidi < > piergior...@apache.org> escribió: > > > Ok guys, > > > > this is the last day, if it is ok for you I can add all these ideas or > just > > a few. > > > > Please share your comments about this. > > This is a great opportunity to involve other people. > > > > Thank you. > > > > Cheers, > > PJ > > > > 2018-01-24 20:21 GMT+01:00 Piergiorgio Lucidi : > > > > > Some ideas from different projects are coming in JIRA: > > > https://issues.apache.org/jira/issues/?filter=12343065 > > > > > > PJ > > > > > > 2018-01-24 11:26 GMT+01:00 Piergiorgio Lucidi >: > > > > > >> Hi, > > >> > > >> Google Summer Of Code 2018 is starting and we are going to confirm the > > >> Apache participation also this year. > > >> We could join the program as usual creating some improvements / tasks > > for > > >> ManifoldCF with a specific JIRA tag "gsoc2018". > > >> > > >> Do you have any ideas to include in our proposals? > > >> I can just start the discussion with some ideas below: > > >> > > >> 1. Start our DevOps adoption: some ready-to-run ManifoldCF Docker > images > > >> on different stack: MySQL, Postgres, etc... > > >> 2. Improve our Cloud services storage adoption: add support for Azure > > and > > >> Amazon storage for both Repository and Output Connectors > > >> 3. Continue our work on Content Migration: making sure that all the > > >> repository connectors correctly work with the existent output > connectors > > >> 4. MongoDB Output Connector > > >> 5. A brand new website template: a porting based on Jekyll? (maybe is > > not > > >> so interesting for a student... but anyway we should do that :-P) > > >> > > >> Please feel free to add any comment about this. > > >> Thank you. > > >> > > >> Cheers, > > >> PJ > > >> > > >> -- Forwarded message -- > > >> From: Ulrich Stärk > > >> Date: 2018-01-21 22:22 GMT+01:00 > > >> Subject: Google Summer of Code 2018 is coming > > >> To: ment...@community.apache.org > > >> > > >> > > >> Hello PMCs (incubator Mentors, please forward this email to your > > >> podlings), > > >> > > >> Google Summer of Code [1] is a program sponsored by Google allowing > > >> students to spend their summer > > >> working on open source software. Students will receive stipends for > > >> developing open source software > > >> full-time for three months. Projects will provide mentoring and > project > > >> ideas, and in return have > > >> the chance to get new code developed and - most importantly - to > > identify > > >> and bring in new committers. > > >> > > >> The ASF will apply as a participating organization meaning individual > > >> projects don't have to apply > > >> separately. > > >> > > >> If you want to participate with your project we ask you to do the > > >> following things as soon as > > >> possible but please no later than 2017-01-30: > > >> > > >> 1. understand what it means to be a mentor [2]. > > >> > > >> 2. record your project ideas. > > >> > > >> Just create issues in JIRA, label them with gsoc2018, and they will > show > > >> up at [3]. Please be as > > >> specific as possible when describing your idea. Include the > programming > > >> language, the tools and > > >> skills required, but try not to scare potential students away. They > are > > >> supposed to learn what's > > >> required before the program starts. > > >> > > >> Use labels, e.g. for the programming language (java, c, c++, erlang, > > >> python, brainfuck, ...) or > > >> technology area (cloud, xml, web, foo, bar, ...) and record them at > [5]. > > >> > > >> Please use the COMDEV JIRA project for recording your ideas if your > > >> project doesn't use JIRA (e.g. > > >> httpd, ooo). Contact d...@community.apache.org if you need assistance. > > >> > > >> [4] contains some additional information (will be updated for 2017 > > >> shortly). > > >> > > >> 3. subscribe to ment...@community.apache.org; restricted to potential > > >> mentors, meant to be used as a > > >> private list - general discussions on the public > > d...@community.apache.org > > >> list as much as possible > > >> please). Use a recognized address when subscribing (@apache.org or > one > > >> of your alias addresses on > > >> record). > > >> > > >> Note that the ASF isn't accepted as a participating organization yet,
[jira] [Commented] (CONNECTORS-1490) GSOC: MongoDB Output Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362299#comment-16362299 ] Piergiorgio Lucidi commented on CONNECTORS-1490: Hi [~irinduPera], thank you for your interest and welcome to the ManifoldCF community! I saw also your private message and I'll reply to you soon. > GSOC: MongoDB Output Connector > -- > > Key: CONNECTORS-1490 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1490 > Project: ManifoldCF > Issue Type: New Feature > Components: MongoDB Output Connector >Reporter: Piergiorgio Lucidi >Assignee: Piergiorgio Lucidi >Priority: Major > Labels: MongoDB, gsoc2018, java, junit > Original Estimate: 480h > Remaining Estimate: 480h > > This is a project idea for [Google Summer of > Code|https://summerofcode.withgoogle.com/] (GSOC). > To discuss this or other ideas with your potential mentor from the Apache > ManifoldCF project, sign up and post to the dev@manifoldcf.apache.org list, > including "[GSOC]" in the subject. You may also comment on this Jira issue if > you have created an account. > We would like to extend the Content Migration capabilities adding MongoDB / > GridFS as a new output connector for importing contents from one or more > repositories supported by ManifoldCF. In this way we will help developers on > migrating contents from different data sources on MongoDB. > You will be involved in the development of the following tasks, you will > learn how to: > * Write the connector implementation > * Implement unit tests > * Build all the integration tests for testing the connector inside the > framework > * Write the documentation for this connector > We have a complete documentation on how to implement an Output Connector: > [https://manifoldcf.apache.org/release/release-2.9.1/en_US/writing-output-connectors.html] > Take a look also at our book to understand better the framework and how to > implement connectors: > [https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs] > > Prospective GSOC mentor: > [piergior...@apache.org|mailto:piergior...@apache.org] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CONNECTORS-1494) Error crawling file system with file names having special characters.
[ https://issues.apache.org/jira/browse/CONNECTORS-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358161#comment-16358161 ] Vinay edited comment on CONNECTORS-1494 at 2/13/18 12:16 PM: - Thanks Karl. Finally figured out the solution. I had to change the default locale configuration for linux. I edited /etc/sysconfig/i18n and changed LANG="en_US.ISO8859". Now it is picking those files. was (Author: vinaybs...@gmail.com): Thanks Karl. Finally figured out the solution. I had to change the default locale configuration for linux. I edited /etc/sysconfig/i18n and changed LANG="en_US.UTF-8". Now it is picking those files. > Error crawling file system with file names having special characters. > - > > Key: CONNECTORS-1494 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1494 > Project: ManifoldCF > Issue Type: Bug > Components: File system connector >Affects Versions: ManifoldCF 2.9.1 >Reporter: Vinay >Assignee: Karl Wright >Priority: Critical > Fix For: ManifoldCF 2.10 > > > I am crawling a file system mounted on linux machine. So the Repository > Connection is of type "File System". For some files which has some special > characters, Manifold Cf is not picking such files. > File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf > exception: java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > ~[?:1.8.0_151] > at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_151] > at java.lang.Long.(Long.java:965) ~[?:1.8.0_151] > at > org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.(DocumentFilter.java:513) > ~[?:?] > at > org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(DocumentFilter.java:76) > ~[?:?] > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(IncrementalIngester.java:503) > ~[mcf-agents.jar:?] > at > org.apache.manifoldcf.crawler.system.PipelineSpecification.(PipelineSpecification.java:47) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:308) > [mcf-pull-agent.jar:?] > FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input > string: "" -- This message was sent by Atlassian JIRA (v7.6.3#76005)