Re: Speculating about the removal of the standalone Solr mode
Personnaly I Think that is very good Idea. In Constellio we use solr cloud and zookeeper for sharding. I think that the way you suggest will simplify and normalise the code. Regards. Le 2016-03-09 3:57 PM, "Shawn Heisey"a écrit : > On 3/9/2016 10:43 AM, Joel Bernstein wrote: > > From Alfresco's point of view this would be a bad thing. Alfresco uses > > Solr in stand alone mode and has developed an entire sharding and > > replication model that fit's the ECM use case. So being forced to have > > ZooKeeper and Solr Cloud would not be ideal. > > I'm aware of the potential pain. Third-party Solr support and the > documentation that goes with it might require significant changes -- but > those changes will already be required if those packages want to add > support for talking to SolrCloud in general. > > I firmly believe that "cloud mode only" is the way Solr is headed, and > that once we reach the other side, Solr will be better, especially > because of documentation and API consolidation. > > My intent would not be to force SolrCloud's built-in sharding on > everyone. You could still do completely manual sharding and work with > individual Solr nodes like before. The difference would be that each > "standalone" Solr node would internally use zookeeper (probably the > embedded server) to manage itself. We might need to invent a > "standalone collection" concept that could be used for > single-shard-single-replica collections, where the core name and the > collection name are the same, instead of cores named foo_shardN_replicaN. > > I myself would feel a lot of the pain you mentioned in relation to > Alfresco. I'm also manually managing shards on standalone Solr > instances. I've got a significant investment in a SolrJ application to > handle these indexes. > > The change should not happen before 7.0, and with a major change like > that on the horizon, the major version *before* the change (such as 6.x) > should remain the stable branch for quite a while, so everybody has time > to update and support cloud mode before it becomes mandatory. There > will be a lot of details to iron out. I hope to be able to help with that. > > Thanks, > Shawn > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
The Constellio team is proud to release its version 1.2
Constellio Enterprise Search team is proud to launch the version 1.2 of its powerful Open Source search engine software. Based on Google Search Appliance connectors and Apache Solr, Constellio allows, by a single search, to find all available information. The software is also compliant with the security requirements of the organisation. This new version brings significant improvements to the search engine. Constellio is now CMIS compliant. This allows him to connect with all compatible systems using this standard, such as Alfresco, Nuxeo, Liferay, Drupal, etc. We also want to thank everyone who has downloaded Constellio. Many important changes have been done in this new version Here are some new features developed in the 1.2 version • CMIS connector • XML connector using Xpath language • Alert mechanism and RSS feeds • Google OneBox technology integration • Automatic recording of historical research • Favorite documents registration • Favorite queries registration • Improvement of the learning mechanism used in the relevance calculation of search results • Sitemaps protocol support for HTTP connectors • Multiselect facets using Ajax technology -- - Rida Benjelloun Constellio - Doculibre ridabenjell...@apache.org rida.benjell...@doculibre.com -
SV:
I bought some items from a commercial site, because of the unique channel of purchases, product prices unexpected, I think you can go to see: elesales.com , high-quality products can also attract you. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re:
I bought some items from a commercial site, because of the unique channel of purchases, product prices unexpected, I think you can go to see: elesales.com , high-quality products can also attract you. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
The Constellio team is proud to release its version 1.1
The Constellio team is proud to release its version 1.1 Constellio Open Source Enterprise Search is based on Apache Solr and using Google Search Appliances connectors architecture, it allows, with a single click, to find all relevant content in your organization (Web, email, ECM, CRM etc.). Please be advised that the GPL v.3.0 Constellio licence has been changed for the version LGPL v.3.0. The new licence LGPL v.3.0 gives more flexibility to developers interested in plugs-in/modules development or the integration of Constellio to other solutions. The SVN (svn.constellio.com) and the issue tracker ( issues.constellio.com) are now also open. Many important changes have been done in this new version. Here are some of new features developed in the 1.1 version: - Constellio multi-platform installer - Federeted search - Document security - Autocomplete for simple search base on most popular queries - Configurable advanced search interface and autocomplete based on field content - Solr connector (upload your schema.xml and content - xml and binary - files) - Activation of Solr HTTP Web services and make Constellio spell checker available through these services - Implementation of multiselect faceting - Configuration of display fields - Documents consultation used in the relevance calculation of search results - Add field boost, document boost, and Solr dismax (relevance) - Add Carrot2 for faceting - Web crawler improvements - Add new theme - and more ... Your comments/suggestions are also welcomed -- - Rida Benjelloun Constellio - Doculibre ridabenjell...@apache.org rida.benjell...@doculibre.com -
The Constellio team is proud to release its version 1.1
The Constellio team is proud to release its version 1.1 Constellio Open Source Enterprise Search is based on Apache Solr and using Google Search Appliances connectors architecture, it allows, with a single click, to find all relevant content in your organization (Web, email, ECM, CRM etc.). Please be advised that the GPL v.3.0 Constellio licence has been changed for the version LGPL v.3.0. The new licence LGPL v.3.0 gives more flexibility to developers interested in plugs-in/modules development or the integration of Constellio to other solutions. The SVN (svn.constellio.com) and the issue tracker ( issues.constellio.com) are now also open. Many important changes have been done in this new version. Here are some of new features developed in the 1.1 version: - Constellio multi-platform installer - Federeted search - Document security - Autocomplete for simple search base on most popular queries - Configurable advanced search interface and autocomplete based on field content - Solr connector (upload your schema.xml and content - xml and binary - files) - Activation of Solr HTTP Web services and make Constellio spell checker available through these services - Implementation of multiselect faceting - Configuration of display fields - Documents consultation used in the relevance calculation of search results - Add field boost, document boost, and Solr dismax (relevance) - Add Carrot2 for faceting - Web crawler improvements - Add new theme - and more ... Your comments/suggestions are also welcomed -- - Rida Benjelloun Constellio - Doculibre ridabenjell...@apache.org rida.benjell...@doculibre.com - -- - Rida Benjelloun Constellio - Doculibre ridabenjell...@apache.org rida.benjell...@doculibre.com -
Constellio Enterprise Search announces its first Open Source release
The Constellio team is proud to announce the release of the first Open Source version of Constellio Enterprise Search. It is available for download at the following address : http://www.constellio.com Based on Apache Solr and using Google Search Appliance's connector architecture, Constellio provides the solution to index all sources of information in your business. Some key features : • Administration interface • Security management • Federated search of all data of the organization; • Discovery tool (faceted search); • Real time indexing, Web, file system, mail, database, ECM and CRM crawlers • Search engine collaboration (Tagging, best bets, synonyms); • Machine learning, Classification and clustering • Thesaurus and taxonomy support using OWL and SKOS • Entity extraction, • Multilingual spell checker • Reports and statistics on indexing and search; • Supports multilingual interface (I18N); • Supports over 15 languages (with stemming); • Documents and fields boosting • Search results sorting • And more… - Rida Benjelloun ridabenjell...@apache.org rida.benjell...@doculibre.com -
Re: [VOTE] Apache Tika 0.3 release candidate 2
Hi, +1 Release the packages as Apache Tika 0.3. Regards. 2009/3/15 Jukka Zitting jukka.zitt...@gmail.com Hi, On Fri, Mar 13, 2009 at 8:43 PM, Mattmann, Chris A chris.a.mattm...@jpl.nasa.gov wrote: Please vote on releasing these packages as Apache Tika 0.3. The vote is open for the next 72 hours. Only votes from Lucene PMC are binding, but everyone is welcome to check the release candidate and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. [x] +1 Release the packages as Apache Tika 0.3. BR, Jukka Zitting
Contributing Lius to Lucene
Hi, In February 2007, I had sent a message to lucene Dev to make Lius incubating in Apache. Several solution were proposed, among those, adding Lius under Lucene contrib. The solution that we retained is to start the Tika project in Incubation and to join Lius parsers to the project in Incubation. The 0.1 version of tika will ready in 2 or 3 weeks. I will integrate Tika in Lius and refactor it and I would like to contribute Lius to Lucene. Lius is currently downloaded 400 times per month. Lius is under ASF licence. Lius link in sourceforge : http://sourceforge.net/projects/lius/ Regards. Rida Benjelloun [EMAIL PROTECTED]
[jira] Updated: (SOLR-209) Multifields and multivalued facets
[ https://issues.apache.org/jira/browse/SOLR-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rida Benjelloun updated SOLR-209: - Attachment: MultiFieldsHitsFacets.java Java Class Multifields and multivalued facets -- Key: SOLR-209 URL: https://issues.apache.org/jira/browse/SOLR-209 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.1.0 Environment: Java Reporter: Rida Benjelloun Fix For: 1.1.0 Attachments: MultiFieldsHitsFacets.java, MultiFieldsHitsFacets.patch MultiFieldsHitsFacets, increase the performance of faceting in multiValued fields, buy creating facets from Lucene Hits. It also allows the creation of facet using multiple fields. The fields must be separated by single space Example : facet.field=subject subjectG subjectA . Rida Benjelloun [EMAIL PROTECTED] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-209) Multifields and multivalued facets
[ https://issues.apache.org/jira/browse/SOLR-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rida Benjelloun updated SOLR-209: - Attachment: MultiFieldsHitsFacets.patch Multifields and multivalued facets -- Key: SOLR-209 URL: https://issues.apache.org/jira/browse/SOLR-209 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.1.0 Environment: Java Reporter: Rida Benjelloun Fix For: 1.1.0 Attachments: MultiFieldsHitsFacets.java, MultiFieldsHitsFacets.patch MultiFieldsHitsFacets, increase the performance of faceting in multiValued fields, buy creating facets from Lucene Hits. It also allows the creation of facet using multiple fields. The fields must be separated by single space Example : facet.field=subject subjectG subjectA . Rida Benjelloun [EMAIL PROTECTED] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-209) Multifields and multivalued facets
[ https://issues.apache.org/jira/browse/SOLR-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488754 ] Rida Benjelloun commented on SOLR-209: -- Hi Yonik, The MultiFieldsHitsfacets use lucene hits to create the facet. For each field on which we want to make a facet, I get the field content from the lucene document in the search results, and I check if the Map key is equals to the field content, if no, I add it as a key in the Hashmap, else I get the Map content wish is an Integer and I increment it. Finally I sort the HashMap. With this approach you can tokenized your fields, and it will not have any impact on the faceting mechanism because I use the string content to make the facets, however the field must be stored. Regards. Rida Benjelloun Multifields and multivalued facets -- Key: SOLR-209 URL: https://issues.apache.org/jira/browse/SOLR-209 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.1.0 Environment: Java Reporter: Rida Benjelloun Fix For: 1.1.0 Attachments: MultiFieldsHitsFacets.java, MultiFieldsHitsFacets.patch MultiFieldsHitsFacets, increase the performance of faceting in multiValued fields, buy creating facets from Lucene Hits. It also allows the creation of facet using multiple fields. The fields must be separated by single space Example : facet.field=subject subjectG subjectA . Rida Benjelloun [EMAIL PROTECTED] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Lius into apache incubator
Hi, You could actually use Lius as text extraction API, I have implement for each Indexer a method that allows you to get the String content of the Document. Lius could be used as a starting point of Tika project, if Tika committers are interested on it. We can also as mark said decouple Lius's parser logic from it's indexing logic. Taking the project into Apache incubator could be also interesting, to get more people involved on it. My goal is to join our effort to build a framework for text extraction. Here is an example of text extraction with lius : LiusConfig lc = LiusConfigBuilder.getSingletonInstance().getLiusConfig(liusConfigPathString); Indexer indexer = IndexerFactory.getIndexer(documentToIndex, lc); String text = Indexer.getContent(); On 3/1/07, Jukka Zitting [EMAIL PROTECTED] wrote: Hi, I am interested in a Lius/Tika project that could be used not only with Lucene. As mentioned by Mark, there are a number of related efforts which leads me to believe a application-independent content analysis/parsing tool would be very helpful for many users. I'd like to propose taking the project to the Apache Incubator to better attract interest also from outside Lucene. BR, Jukka Zitting -- View this message in context: http://www.nabble.com/Lius-into-apache-incubator-tf3145937.html#a9247508 Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Lius into apache incubator
Hi, On 3/1/07, Jukka Zitting [EMAIL PROTECTED] wrote: Hi, On 3/1/07, Rida Benjelloun [EMAIL PROTECTED] wrote: Lius could be used as a starting point of Tika project, if Tika committers are interested on it. We can also as mark said decouple Lius's parser logic from it's indexing logic. I'm very interested in doing that. Another very useful codebase, among others, would be the existing parser framework in the Nutch project. -- I agree Taking the project into Apache incubator could be also interesting, to get more people involved on it. Exactly. I'd like to avoid starting just yet another codebase, and focus more on bringing the best parts (both code and ideas) of the existing projects together. The community-building focus of the Incubator would likely help with that. Another aspect that would benefit from the Incubator scrutiny are the legal implications of pulling together multiple document parser libraries under various different licenses. Would there be interest within the Lucene PMC in sponsoring a proposal along such lines? I can volunteer to put together the proposal and act as the champion and mentor of the project. -- We can put together the proposal and you can be the mentor of the project. BR, Jukka Zitting - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- --- Rida Benjelloun, M.S.I., M.B.A. Président directeur général DocuLibre inc. Téléphone : (418) 262-3222 Site Web : http://www.doculibre.com Courriel : [EMAIL PROTECTED] ---
Re: [jira] Lius into apache incubator
Hi, Thanks Doug, I think that your help will be very appricieted as a mentor. Regards. On 3/1/07, Doug Cutting [EMAIL PROTECTED] wrote: Jukka Zitting wrote: PS. Will people mind if we use this list for fleshing out the details? I've created a Google Group for Tika where we could also take the discussion if that's preferred. I think the Incubator Wiki would be the best place for this. http://wiki.apache.org/incubator/?action=fullsearchvalue=proposaltitlesearch=Titles Interested folks could subscribe to the proposal page. You could announce the proposal page on several lists. Will that work for you? Also, I can probably help as a mentor if needed. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lius into apache incubator
Hi Otis, Many thanks for your comments, I'm so sorry for this late answer. I will add lius as lucene contrib and I will change the licence to ASL. There are some developper contributing to Lius but there are not very active. For the question : this is a Laval University project, right? But you work at DocuLibre? I have develpped lius during my study at laval university, I still the copy right owner for this projet, so I can change the licence to ASL without any problem. Lius has been used in serveral projet at Laval university and I deceded to hoste it in Laval. I work at Laval and at Doculibre. Tika is a really good projet and I'm really interested to join it. Regards. On 1/31/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi Rida, Some comments in no particular order: - Looks useful - This looks like a more expanded version of what Erik and I wrote for LIA, and I know people often ask and use that code, so I know there is a need for a framework that knows how to parse various document formats - Nutch has some of the document parsing code written in form of plugins. A few people wanted to decouple that from Nutch in a Tika project: http://code.google.com/p/tika/ . Not sure what the status is, I think only Jukka Zitting did any work there, but I think the initial idea was never fully funished. If LIUS joins Lucene, I think some of this duplication should be cleaned up, so we have only one framework for parsing various types of document formats. - Going through the Incubator is one way to go. Perhaps another way to get LIUS under Lucene is to just place it under contrib/, say contrib/lius. - Licensing would have to change to ASL and you would probably also have to send in your ASF CLA. - Any dependencies on GPL or LGPL or code released under other licenses would have to either be removed, or you'd have to fetch the required Jars at compile/build time. A few projects under Lucene contrib/ already do that, I believe - Are there developers who are actively working on LIUS? Fixing bugs, adding features, keeping up with new versions of dependencies, etc. Otis P.S. Out of curiosity - this is a Laval University project, right? But you work at DocuLibre? - Original Message From: Rida Benjelloun [EMAIL PROTECTED] To: java-user@lucene.apache.org; java-dev@lucene.apache.org Sent: Tuesday, January 30, 2007 7:27:28 PM Subject: Lius into apache incubator Hi, I would like to add Lius framework (http://sourceforge.net/projects/lius/) to apache incubator. Is there some volontiers to do this job and to contribute to the developement of this project. Thanks. Rida Benjelloun. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Phrase query analysis-fr
Hi, When I use analysis-fr for indexing and searching, I'm not able to search by phrase query. I'm using nutch-0.8.1. Could someone help ? Best regards
[jira] Updated: (NUTCH-185) XMLParser is configurable xml parser plugin.
[ http://issues.apache.org/jira/browse/NUTCH-185?page=all ] Rida Benjelloun updated NUTCH-185: -- Attachment: parse-xml.zip Hi, The plugin parse-xml has been updated. I have tested it with 0.8.1 version. The plugin fix also the bug related the multi-fields values. Best regards Rida Benjelloun. [EMAIL PROTECTED] XMLParser is configurable xml parser plugin. Key: NUTCH-185 URL: http://issues.apache.org/jira/browse/NUTCH-185 Project: Nutch Issue Type: New Feature Components: fetcher, indexer Affects Versions: 0.7.2 Environment: OS Independent Reporter: Rida Benjelloun Attachments: parse-xml.zip, parse-xml.zip Xml parser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. Informations : 1- Copy xmlparser-conf.xml to the nutch/conf dir 2- To index your custom XML file, you have to modify the xmlparser-conf.xml. This parser uses namespaces and XPATH to parse XML content The config file do the mapping between the XML noeds (using XPATH) and lucene field. Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 3- The xmlIndexerProperties encapsulate a set of fields associated to a namespace. If the namespace is found in the xml document, the fields represented by the namespace will be indexed. Example : xmlIndexerProperties type=filePerDocument namespace= http://purl.org/dc/elements/1.1/; field name=dctitle xpath=//dc:title type=Text boost= 1.4 / field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / /xmlIndexerProperties 4- It is possible to define a default namespace that will be applied when the parser didn't find any namespace in the document or when the namespace found in the xml document doesn't match with the namespace defined in the xmlIndexerProperties. Example : xmlIndexerProperties type=filePerDocument namespace=default field name=xmlcontent xpath=//* type=Unstored boost=1.0 / /xmlIndexerProperties -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-185) XMLParser is configurable xml parser plugin.
[ http://issues.apache.org/jira/browse/NUTCH-185?page=all ] Rida Benjelloun updated NUTCH-185: -- Affects Version/s: 0.8.1 0.8 XMLParser is configurable xml parser plugin. Key: NUTCH-185 URL: http://issues.apache.org/jira/browse/NUTCH-185 Project: Nutch Issue Type: New Feature Components: fetcher, indexer Affects Versions: 0.7.2, 0.8.1, 0.8 Environment: OS Independent Reporter: Rida Benjelloun Attachments: parse-xml.zip, parse-xml.zip Xml parser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. Informations : 1- Copy xmlparser-conf.xml to the nutch/conf dir 2- To index your custom XML file, you have to modify the xmlparser-conf.xml. This parser uses namespaces and XPATH to parse XML content The config file do the mapping between the XML noeds (using XPATH) and lucene field. Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 3- The xmlIndexerProperties encapsulate a set of fields associated to a namespace. If the namespace is found in the xml document, the fields represented by the namespace will be indexed. Example : xmlIndexerProperties type=filePerDocument namespace= http://purl.org/dc/elements/1.1/; field name=dctitle xpath=//dc:title type=Text boost= 1.4 / field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / /xmlIndexerProperties 4- It is possible to define a default namespace that will be applied when the parser didn't find any namespace in the document or when the namespace found in the xml document doesn't match with the namespace defined in the xmlIndexerProperties. Example : xmlIndexerProperties type=filePerDocument namespace=default field name=xmlcontent xpath=//* type=Unstored boost=1.0 / /xmlIndexerProperties -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [Proposal] New Lucene sub-project
Hi Jérôme, I found your idea very interesting. I will be interested to contribute to the Parse Plugins Framework. I have developed similar one using Lucene. The project name is Lius. If you are interested please let me know. On 4/7/06, Jérôme Charron [EMAIL PROTECTED] wrote: Hi all, While chatting with Chris Mattmann, it seems to be evident to us that there is a need for a new sub-project within Lucene. For now, Lucene's sub-projects used in Nutch are : 1. Lucene-java - The basis for search technology 2. Hadoop - The distributed computing platform 3. Nutch - The search engine that relies on Lucene and Hadoop. Since Nutch contains some value added pieces of code that focus on content analysis, we think it would be a good idea to split Nutch into a new sub-project based on content analysis manipulation. The components we have identified are : 1. MimeType Repository 2. Language Identifier 3. Content Signature (MD5Signature / TextProfileSignature / ...) (4. Generic Meta Data Infrastructure) (5. Charset Detector) (6. Parse Plugins Framework) The idea is to expose these pieces of codes into a standalone lib, since we are convinced they could be usefull in many other projects than Nutch. The benefits will be to have some code more widely used / tested / contributed. If this proposal is accepted, we have a candidate name for this new project: Tika (comes from my son ;-) ) Any comment is welcome. Jérôme
Nutch plugin
Hi, I would like to know what is the process used by nutch to evaluate a plugin contribution and add it to nutch distribution. I have create this issue :
[jira] Updated: (NUTCH-185) XMLParser is configurable xml parser plugin.
[ http://issues.apache.org/jira/browse/NUTCH-185?page=all ] Rida Benjelloun updated NUTCH-185: -- Summary: XMLParser is configurable xml parser plugin. (was: XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields.) Description: Xml parser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. Informations : 1- Copy xmlparser-conf.xml to the nutch/conf dir 2- To index your custom XML file, you have to modify the xmlparser-conf.xml. This parser uses namespaces and XPATH to parse XML content The config file do the mapping between the XML noeds (using XPATH) and lucene field. Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 3- The xmlIndexerProperties encapsulate a set of fields associated to a namespace. If the namespace is found in the xml document, the fields represented by the namespace will be indexed. Example : xmlIndexerProperties type=filePerDocument namespace= http://purl.org/dc/elements/1.1/; field name=dctitle xpath=//dc:title type=Text boost= 1.4 / field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / /xmlIndexerProperties 4- It is possible to define a default namespace that will be applied when the parser didn't find any namespace in the document or when the namespace found in the xml document doesn't match with the namespace defined in the xmlIndexerProperties. Example : xmlIndexerProperties type=filePerDocument namespace=default field name=xmlcontent xpath=//* type=Unstored boost=1.0 / /xmlIndexerProperties was: XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. Informations : 1- Copy xmlparser-conf.xml to the nutch/conf dir 2- To index your custom XML file, you have to modify the xmlparser-conf.xml. This parser uses namespaces and XPATH to parse XML content The config file do the mapping between the XML noeds (using XPATH) and lucene field. Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 3- The xmlIndexerProperties encapsulate a set of fields associated to a namespace. If the namespace is found in the xml document, the fields represented by the namespace will be indexed. Example : xmlIndexerProperties type=filePerDocument namespace= http://purl.org/dc/elements/1.1/; field name=dctitle xpath=//dc:title type=Text boost= 1.4 / field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / /xmlIndexerProperties 4- It is possible to define a default namespace that will be applied when the parser didn't find any namespace in the document or when the namespace found in the xml document doesn't match with the namespace defined in the xmlIndexerProperties. Example : xmlIndexerProperties type=filePerDocument namespace=default field name=xmlcontent xpath=//* type=Unstored boost=1.0 / /xmlIndexerProperties XMLParser is configurable xml parser plugin. - Key: NUTCH-185 URL: http://issues.apache.org/jira/browse/NUTCH-185 Project: Nutch Type: New Feature Components: fetcher, indexer Versions: 0.7.2-dev Environment: OS Independent Reporter: Rida Benjelloun Attachments: parse-xml.zip Xml parser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. Informations : 1- Copy xmlparser-conf.xml to the nutch/conf dir 2- To index your custom XML file, you have to modify the xmlparser-conf.xml. This parser uses namespaces and XPATH to parse XML content The config file do the mapping between the XML noeds (using XPATH) and lucene field. Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 3- The xmlIndexerProperties encapsulate a set of fields associated to a namespace. If the namespace is found in the xml document, the fields represented by the namespace will be indexed. Example : xmlIndexerProperties type=filePerDocument namespace= http://purl.org/dc/elements/1.1/; field name=dctitle xpath=//dc:title type=Text boost= 1.4 / field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / /xmlIndexerProperties 4- It is possible to define a default namespace that will be applied when the parser didn't find any namespace in the document or when the namespace found in the xml document doesn't match with the namespace defined in the xmlIndexerProperties. Example : xmlIndexerProperties type=filePerDocument namespace=default field name=xmlcontent xpath=//* type=Unstored boost=1.0 / /xmlIndexerProperties -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see
Re: need volunteer to develop search for apache.org
Hi Doug, I will be interested by this development. I have a lot of experience with lucene. Best regards On 1/27/06, Fuad Efendi [EMAIL PROTECTED] wrote: Hope to join! +1 -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 25, 2006 4:24 PM To: nutch-dev@lucene.apache.org Subject: need volunteer to develop search for apache.org Would someone volunteer to develop Nutch-based site-search engine for all apache.org domains? We now have a Solaris zone to host this. Thanks, Doug -- Rida Benjelloun Président directeur général DocuLibre inc. Téléphone : (418) 262-3222 Site Web : http://www.doculibre.com Courriel : [EMAIL PROTECTED]
Re: [jira] Commented: (NUTCH-185) XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields.
Hi Philippe, Thanks, for your comments. I have already add multi-values for a field in lucene. I will try it with nutch plugin. Best regards. On 1/26/06, Philippe EUGENE (JIRA) [EMAIL PROTECTED] wrote: [ http://issues.apache.org/jira/browse/NUTCH-185?page=comments#action_12364087] Philippe EUGENE commented on NUTCH-185: --- Great Plugin. Thanks ! I succesfull test this plugin on a 0.7.1 version of nutch. I have just a problem with somes structures like this : authors authorauthor1/author authorauthor2/author authorauthor3/author authorr In my Lucene Index i just see the author3 value for this field. I'm not sure that the problem is on the plugin. I don't know if it's possible to have multi-values for a field on nutch 0.7.1 XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. --- Key: NUTCH-185 URL: http://issues.apache.org/jira/browse/NUTCH-185 Project: Nutch Type: New Feature Components: fetcher, indexer Versions: 0.7.2-dev Environment: OS Independent Reporter: Rida Benjelloun Attachments: parse-xml.zip XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. Informations : 1- Copy xmlparser-conf.xml to the nutch/conf dir 2- To index your custom XML file, you have to modify the xmlparser-conf.xml. This parser uses namespaces and XPATH to parse XML content The config file do the mapping between the XML noeds (using XPATH) and lucene field. Example : field name=dctitle xpath=//dc:title type=Text boost= 1.4 / 3- The xmlIndexerProperties encapsulate a set of fields associated to a namespace. If the namespace is found in the xml document, the fields represented by the namespace will be indexed. Example : xmlIndexerProperties type=filePerDocument namespace= http://purl.org/dc/elements/1.1/; field name=dctitle xpath=//dc:title type=Text boost= 1.4 / field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / /xmlIndexerProperties 4- It is possible to define a default namespace that will be applied when the parser didn't find any namespace in the document or when the namespace found in the xml document doesn't match with the namespace defined in the xmlIndexerProperties. Example : xmlIndexerProperties type=filePerDocument namespace=default field name=xmlcontent xpath=//* type=Unstored boost=1.0 / /xmlIndexerProperties -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira -- Rida Benjelloun Président directeur général DocuLibre inc. Téléphone : (418) 262-3222 Site Web : http://www.doculibre.com Courriel : [EMAIL PROTECTED]
[jira] Created: (NUTCH-185) XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields.
XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. Key: NUTCH-185 URL: http://issues.apache.org/jira/browse/NUTCH-185 Project: Nutch Type: New Feature Components: fetcher, indexer Versions: 0.7.2-dev Environment: OS Independent Reporter: Rida Benjelloun XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. Informations : 1- Copy xmlparser-conf.xml to the nutch/conf dir 2- To index your custom XML file, you have to modify the xmlparser-conf.xml. This parser uses namespaces and XPATH to parse XML content The config file do the mapping between the XML noeds (using XPATH) and lucene field. Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 3- The xmlIndexerProperties encapsulate a set of fields associated to a namespace. If the namespace is found in the xml document, the fields represented by the namespace will be indexed. Example : xmlIndexerProperties type=filePerDocument namespace= http://purl.org/dc/elements/1.1/; field name=dctitle xpath=//dc:title type=Text boost= 1.4 / field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / /xmlIndexerProperties 4- It is possible to define a default namespace that will be applied when the parser didn't find any namespace in the document or when the namespace found in the xml document doesn't match with the namespace defined in the xmlIndexerProperties. Example : xmlIndexerProperties type=filePerDocument namespace=default field name=xmlcontent xpath=//* type=Unstored boost=1.0 / /xmlIndexerProperties -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
xml-parser plugin contribution
Hi, I have developed an xml parser plugin. I have test it with nutch 0.7.2. The parser use namespaces and xpath to do the mapping between XML nodes and lucene fields. I'm trying to send the source of the plugin in a zip file but my message is always rejected (it is considered as a spam). How can I send the source code ? Best regards.
Class MultiProperties
Hi all, I'm using nutch 0.7.1 jar, and I'm note able to find the class *MultiProperties *in the package* org.apache.nutch.protocol.httpclient.* When I look to the javadoc of the 0.7.1 version this class exist. Could you please help me ? Best regards.
OpenOffice and Excel parsers
Hi, Is someone working on OpenOffice and Excel parsers ? because I have already developed them in Lius (http://sourceforge.net/projects/lius) and I whant to adapt them for nutch. I have checked the SVN and I didn't find OO and Excel parser. Best regards Rida.