Re: Speculating about the removal of the standalone Solr mode

2016-03-09 Thread Rida Benjelloun
Personnaly I Think that is very good Idea. In Constellio we use solr cloud
and zookeeper for sharding. I think that   the way  you suggest will
simplify and normalise the code.
Regards.
Le 2016-03-09 3:57 PM, "Shawn Heisey"  a écrit :

> On 3/9/2016 10:43 AM, Joel Bernstein wrote:
> > From Alfresco's point of view this would be a bad thing. Alfresco uses
> > Solr in stand alone mode and has developed an entire sharding and
> > replication model that fit's the ECM use case. So being forced to have
> > ZooKeeper and Solr Cloud would not be ideal.
>
> I'm aware of the potential pain.  Third-party Solr support and the
> documentation that goes with it might require significant changes -- but
> those changes will already be required if those packages want to add
> support for talking to SolrCloud in general.
>
> I firmly believe that "cloud mode only" is the way Solr is headed, and
> that once we reach the other side, Solr will be better, especially
> because of documentation and API consolidation.
>
> My intent would not be to force SolrCloud's built-in sharding on
> everyone.  You could still do completely manual sharding and work with
> individual Solr nodes like before.  The difference would be that each
> "standalone" Solr node would internally use zookeeper (probably the
> embedded server) to manage itself.  We might need to invent a
> "standalone collection" concept that could be used for
> single-shard-single-replica collections, where the core name and the
> collection name are the same, instead of cores named foo_shardN_replicaN.
>
> I myself would feel a lot of the pain you mentioned in relation to
> Alfresco.  I'm also manually managing shards on standalone Solr
> instances.  I've got a significant investment in a SolrJ application to
> handle these indexes.
>
> The change should not happen before 7.0, and with a major change like
> that on the horizon, the major version *before* the change (such as 6.x)
> should remain the stable branch for quite a while, so everybody has time
> to update and support cloud mode before it becomes mandatory.  There
> will be a lot of details to iron out.  I hope to be able to help with that.
>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


The Constellio team is proud to release its version 1.2

2011-03-02 Thread Rida Benjelloun
Constellio Enterprise Search team is proud to launch the version 1.2 of its
powerful Open Source search engine software. Based on Google Search
Appliance connectors and Apache Solr, Constellio allows, by a single search,
to find all available information. The software is also compliant with the
security requirements of the organisation.



This new version brings significant improvements to the search engine.
Constellio is now CMIS compliant. This allows him to connect with all
compatible systems using this standard, such as Alfresco, Nuxeo, Liferay,
Drupal, etc.



We also want to thank everyone who has downloaded Constellio. Many important
changes have been done in this new version



Here are some new features developed in the 1.2 version





• CMIS connector

• XML connector using Xpath language

• Alert mechanism and RSS feeds

• Google OneBox technology integration

• Automatic recording of historical research

• Favorite documents registration

• Favorite queries registration

• Improvement of the learning mechanism used in the relevance
calculation of search results

• Sitemaps protocol support for HTTP connectors

• Multiselect facets using Ajax technology






-- 
-
Rida Benjelloun
Constellio -  Doculibre
ridabenjell...@apache.org
rida.benjell...@doculibre.com
-


SV:

2010-12-28 Thread Rida Benjelloun
I bought some items from a commercial site, because of the unique
channel of purchases,
product prices unexpected, I think you can go to see: elesales.com ,
high-quality products can also attract you.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re:

2010-12-28 Thread Rida Benjelloun
I bought some items from a commercial site, because of the unique
channel of purchases,
product prices unexpected, I think you can go to see: elesales.com ,
high-quality products can also attract you.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



The Constellio team is proud to release its version 1.1

2010-12-19 Thread Rida Benjelloun
The Constellio team is proud to release its version 1.1

Constellio Open Source Enterprise Search is based on Apache Solr and using
Google Search Appliances connectors architecture, it allows, with a single
click, to find all relevant content in your organization (Web, email, ECM,
CRM etc.).

Please be advised that the GPL v.3.0 Constellio licence has been changed for
the version LGPL v.3.0.

The new licence LGPL v.3.0 gives more flexibility to developers interested
in plugs-in/modules development or the integration of Constellio to other
solutions. The SVN (svn.constellio.com) and the issue tracker (
issues.constellio.com) are now also open.

Many important changes have been done in this new version.

Here are some of new features developed in the 1.1 version:

   - Constellio multi-platform installer
   - Federeted search
   - Document security
   - Autocomplete for simple search base on most popular queries
   - Configurable advanced search interface and autocomplete based on field
content
   - Solr connector (upload your schema.xml and content - xml and binary -
files)
   - Activation of Solr HTTP Web services and make Constellio spell checker
available through these services
   - Implementation of multiselect faceting
   - Configuration of display fields
   - Documents consultation used in the relevance calculation of search
results
   - Add field boost, document boost, and Solr dismax (relevance)
   - Add Carrot2 for faceting
   - Web crawler improvements
   - Add new theme
   - and more ...
 Your comments/suggestions are also welcomed

-- 
-
Rida Benjelloun
Constellio -  Doculibre
ridabenjell...@apache.org
rida.benjell...@doculibre.com
-


The Constellio team is proud to release its version 1.1

2010-12-19 Thread Rida Benjelloun
The Constellio team is proud to release its version 1.1

Constellio Open Source Enterprise Search is based on Apache Solr and using
Google Search Appliances connectors architecture, it allows, with a single
click, to find all relevant content in your organization (Web, email, ECM,
CRM etc.).

Please be advised that the GPL v.3.0 Constellio licence has been changed for
the version LGPL v.3.0.

The new licence LGPL v.3.0 gives more flexibility to developers interested
in plugs-in/modules development or the integration of Constellio to other
solutions. The SVN (svn.constellio.com) and the issue tracker (
issues.constellio.com) are now also open.

Many important changes have been done in this new version.

Here are some of new features developed in the 1.1 version:

   - Constellio multi-platform installer
   - Federeted search
   - Document security
   - Autocomplete for simple search base on most popular queries
   - Configurable advanced search interface and autocomplete based on field
content
   - Solr connector (upload your schema.xml and content - xml and binary -
files)
   - Activation of Solr HTTP Web services and make Constellio spell checker
available through these services
   - Implementation of multiselect faceting
   - Configuration of display fields
   - Documents consultation used in the relevance calculation of search
results
   - Add field boost, document boost, and Solr dismax (relevance)
   - Add Carrot2 for faceting
   - Web crawler improvements
   - Add new theme
   - and more ...
 Your comments/suggestions are also welcomed

-- 
-
Rida Benjelloun
Constellio -  Doculibre
ridabenjell...@apache.org
rida.benjell...@doculibre.com
-


-- 
-
Rida Benjelloun
Constellio -  Doculibre
ridabenjell...@apache.org
rida.benjell...@doculibre.com
-


Constellio Enterprise Search announces its first Open Source release

2010-09-22 Thread Rida Benjelloun
The Constellio team is proud to announce the release of the first Open
Source version of Constellio Enterprise Search. It is available for download
at the following address : http://www.constellio.com

Based on  Apache Solr and using Google Search Appliance's connector
architecture, Constellio provides the solution to index all sources of
information in your business.



Some key features :

• Administration interface

• Security management

• Federated search of all data of the organization;

• Discovery tool (faceted search);

• Real time indexing, Web, file system, mail, database, ECM and CRM crawlers

• Search engine collaboration (Tagging, best bets, synonyms);

• Machine learning, Classification and clustering

• Thesaurus and taxonomy support using OWL and SKOS

• Entity extraction,

• Multilingual spell checker

• Reports and statistics on indexing and search;

• Supports multilingual interface (I18N);

• Supports over 15 languages (with stemming);

• Documents and fields boosting

• Search results sorting

• And more…



-
Rida Benjelloun
ridabenjell...@apache.org
rida.benjell...@doculibre.com
-


Re: [VOTE] Apache Tika 0.3 release candidate 2

2009-03-16 Thread Rida Benjelloun
Hi,
+1 Release the packages as Apache Tika 0.3.
Regards.

2009/3/15 Jukka Zitting jukka.zitt...@gmail.com

 Hi,

 On Fri, Mar 13, 2009 at 8:43 PM, Mattmann, Chris A
 chris.a.mattm...@jpl.nasa.gov wrote:
  Please vote on releasing these packages as Apache Tika 0.3. The vote is
 open
  for the next 72 hours. Only votes from Lucene PMC are binding, but
 everyone
  is welcome to check the release candidate and voice their approval or
  disapproval. The vote passes if at least three binding +1 votes are cast.

 [x] +1 Release the packages as Apache Tika 0.3.

 BR,

 Jukka Zitting



Contributing Lius to Lucene

2007-10-05 Thread Rida Benjelloun
Hi,
In February 2007, I had sent a message to lucene Dev to make Lius incubating
in Apache. Several solution were proposed, among those, adding Lius under
Lucene contrib. The solution that we retained is to start the Tika project
in Incubation and to join Lius parsers  to the project in Incubation. The
0.1 version of tika will ready in 2 or 3 weeks.
I will integrate Tika in Lius and refactor it and I would like to contribute
Lius to Lucene. Lius is currently downloaded 400 times per month.
Lius is under ASF licence.
Lius link in sourceforge : http://sourceforge.net/projects/lius/


Regards.

Rida Benjelloun
[EMAIL PROTECTED]


[jira] Updated: (SOLR-209) Multifields and multivalued facets

2007-04-13 Thread Rida Benjelloun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rida Benjelloun updated SOLR-209:
-

Attachment: MultiFieldsHitsFacets.java

Java Class

 Multifields and multivalued facets
 --

 Key: SOLR-209
 URL: https://issues.apache.org/jira/browse/SOLR-209
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.1.0
 Environment: Java
Reporter: Rida Benjelloun
 Fix For: 1.1.0

 Attachments: MultiFieldsHitsFacets.java, MultiFieldsHitsFacets.patch


 MultiFieldsHitsFacets, increase the performance of faceting in multiValued 
 fields, buy creating facets from Lucene Hits. It also allows the creation of 
 facet using multiple fields. The fields must be separated by single space   
 Example : facet.field=subject subjectG subjectA .
 Rida Benjelloun
 [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-209) Multifields and multivalued facets

2007-04-13 Thread Rida Benjelloun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rida Benjelloun updated SOLR-209:
-

Attachment: MultiFieldsHitsFacets.patch

 Multifields and multivalued facets
 --

 Key: SOLR-209
 URL: https://issues.apache.org/jira/browse/SOLR-209
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.1.0
 Environment: Java
Reporter: Rida Benjelloun
 Fix For: 1.1.0

 Attachments: MultiFieldsHitsFacets.java, MultiFieldsHitsFacets.patch


 MultiFieldsHitsFacets, increase the performance of faceting in multiValued 
 fields, buy creating facets from Lucene Hits. It also allows the creation of 
 facet using multiple fields. The fields must be separated by single space   
 Example : facet.field=subject subjectG subjectA .
 Rida Benjelloun
 [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-209) Multifields and multivalued facets

2007-04-13 Thread Rida Benjelloun (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488754
 ] 

Rida Benjelloun commented on SOLR-209:
--

Hi Yonik,
The MultiFieldsHitsfacets use lucene hits to create the facet. For each field 
on which we want to make a facet, I get the field content from the lucene 
document in the search results, and I check if the Map key is equals to the 
field content, if no, I add it as a key in the Hashmap, else I get the Map 
content wish is an Integer and I increment it. Finally I sort the HashMap. 
With this approach you can tokenized your fields, and it will not have any 
impact on the faceting mechanism because I use the string content to make the 
facets, however the field must be stored.
Regards.

Rida Benjelloun

 Multifields and multivalued facets
 --

 Key: SOLR-209
 URL: https://issues.apache.org/jira/browse/SOLR-209
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.1.0
 Environment: Java
Reporter: Rida Benjelloun
 Fix For: 1.1.0

 Attachments: MultiFieldsHitsFacets.java, MultiFieldsHitsFacets.patch


 MultiFieldsHitsFacets, increase the performance of faceting in multiValued 
 fields, buy creating facets from Lucene Hits. It also allows the creation of 
 facet using multiple fields. The fields must be separated by single space   
 Example : facet.field=subject subjectG subjectA .
 Rida Benjelloun
 [EMAIL PROTECTED]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Lius into apache incubator

2007-03-01 Thread Rida Benjelloun

Hi,
You could actually use Lius as text extraction API, I have implement for
each Indexer a method that allows you to get the String content of the
Document.
Lius could be used as a starting point of Tika project, if Tika committers
are interested on it. We can also as mark said decouple Lius's parser logic
from it's indexing logic.
Taking the project into Apache incubator could be also interesting, to get
more people involved on it.

My goal is to join our effort to build a framework for text extraction.
Here is an example of text extraction with lius :

LiusConfig lc =
LiusConfigBuilder.getSingletonInstance().getLiusConfig(liusConfigPathString);

Indexer indexer = IndexerFactory.getIndexer(documentToIndex, lc);
String text = Indexer.getContent();


On 3/1/07, Jukka Zitting [EMAIL PROTECTED] wrote:



Hi,

I am interested in a Lius/Tika project that could be used not only with
Lucene. As mentioned by Mark, there are a number of related efforts which
leads me to believe a application-independent content analysis/parsing
tool
would be very helpful for many users.

I'd like to propose taking the project to the Apache Incubator to better
attract interest also from outside Lucene.

BR,

Jukka Zitting

--
View this message in context:
http://www.nabble.com/Lius-into-apache-incubator-tf3145937.html#a9247508
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: [jira] Lius into apache incubator

2007-03-01 Thread Rida Benjelloun

Hi,
On 3/1/07, Jukka Zitting [EMAIL PROTECTED] wrote:


Hi,

On 3/1/07, Rida Benjelloun [EMAIL PROTECTED] wrote:
 Lius could be used as a starting point of Tika project, if Tika
committers
 are interested on it. We can also as mark said decouple Lius's parser
logic
 from it's indexing logic.

I'm very interested in doing that. Another very useful codebase, among
others, would be the existing parser framework in the Nutch project.



-- I agree



Taking the project into Apache incubator could be also interesting, to get
 more people involved on it.

Exactly. I'd like to avoid starting just yet another codebase, and
focus more on bringing the best parts (both code and ideas) of the
existing projects together. The community-building focus of the
Incubator would likely help with that. Another aspect that would
benefit from the Incubator scrutiny are the legal implications of
pulling together multiple document parser libraries under various
different licenses.

Would there be interest within the Lucene PMC in sponsoring a proposal
along such lines? I can volunteer to put together the proposal and act
as the champion and mentor of the project.



--  We can put together the proposal and you can be the mentor of the
project.

BR,


Jukka Zitting

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





--
---
Rida Benjelloun, M.S.I., M.B.A.
Président directeur général
DocuLibre inc.
Téléphone : (418) 262-3222
Site Web : http://www.doculibre.com
Courriel : [EMAIL PROTECTED]
---


Re: [jira] Lius into apache incubator

2007-03-01 Thread Rida Benjelloun

Hi,
Thanks Doug, I think that your help will be very appricieted as a mentor.
Regards.

On 3/1/07, Doug Cutting [EMAIL PROTECTED] wrote:


Jukka Zitting wrote:
 PS. Will people mind if we use this list for fleshing out the details?
 I've created a Google Group for Tika where we could also take the
 discussion if that's preferred.

I think the Incubator Wiki would be the best place for this.


http://wiki.apache.org/incubator/?action=fullsearchvalue=proposaltitlesearch=Titles

Interested folks could subscribe to the proposal page.  You could
announce the proposal page on several lists.  Will that work for you?

Also, I can probably help as a mentor if needed.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Lius into apache incubator

2007-02-28 Thread Rida Benjelloun

Hi Otis,
Many thanks for your comments, I'm so sorry for this late answer. I will add
lius as lucene contrib and I will change the licence to ASL.
There are some developper contributing to Lius but there are not very
active.
For the question : this is a Laval University project, right?  But you work
at DocuLibre?
I have develpped lius during my study at laval university, I still the copy
right owner for this projet, so I can change the licence to ASL without any
problem. Lius has been used in serveral projet at Laval university and I
deceded to hoste it in Laval.
I work at Laval and at Doculibre.

Tika is a really good projet and I'm really interested to join it.

Regards.


On 1/31/07, Otis Gospodnetic [EMAIL PROTECTED] wrote:


Hi Rida,

Some comments in no particular order:

- Looks useful

- This looks like a more expanded version of what Erik and I wrote for
LIA, and I know people often ask and use that code, so I know there is a
need for a framework that knows how to parse various document formats

- Nutch has some of the document parsing code written in form of
plugins.  A few people wanted to decouple that from Nutch in a Tika project:
http://code.google.com/p/tika/ .  Not sure what the status is, I think
only Jukka Zitting did any work there, but I think the initial idea was
never fully funished.  If LIUS joins Lucene, I think some of this
duplication should be cleaned up, so we have only one framework for parsing
various types of document formats.

- Going through the Incubator is one way to go.  Perhaps another way to
get LIUS under Lucene is to just place it under contrib/, say contrib/lius.

- Licensing would have to change to ASL and you would probably also have
to send in your ASF CLA.

- Any dependencies on GPL or LGPL or code released under other licenses
would have to either be removed, or you'd have to fetch the required Jars at
compile/build time.  A few projects under Lucene contrib/ already do that, I
believe

- Are there developers who are actively working on LIUS?  Fixing bugs,
adding features, keeping up with new versions of dependencies, etc.

Otis
P.S.
Out of curiosity - this is a Laval University project, right?  But you
work at DocuLibre?

- Original Message 
From: Rida Benjelloun [EMAIL PROTECTED]
To: java-user@lucene.apache.org; java-dev@lucene.apache.org
Sent: Tuesday, January 30, 2007 7:27:28 PM
Subject: Lius into apache incubator

Hi,
I would like to add Lius framework (http://sourceforge.net/projects/lius/)
to apache incubator. Is there some volontiers to do this job and to
contribute to the developement of this project.

Thanks.

Rida Benjelloun.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Phrase query analysis-fr

2006-12-02 Thread Rida Benjelloun

Hi,
When I use analysis-fr for indexing and searching, I'm not able to search by
phrase query. I'm using nutch-0.8.1.

Could someone help ?
Best regards


[jira] Updated: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-10-23 Thread Rida Benjelloun (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-185?page=all ]

Rida Benjelloun updated NUTCH-185:
--

Attachment: parse-xml.zip

Hi,
The plugin parse-xml has been updated. I have tested it with 0.8.1 version. The 
plugin fix also the bug related the multi-fields values.

Best regards 

Rida Benjelloun.
[EMAIL PROTECTED]

 XMLParser is configurable xml parser plugin.
 

 Key: NUTCH-185
 URL: http://issues.apache.org/jira/browse/NUTCH-185
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher, indexer
Affects Versions: 0.7.2
 Environment: OS Independent
Reporter: Rida Benjelloun
 Attachments: parse-xml.zip, parse-xml.zip


 Xml parser  is configurable plugin. It use XPath and namespaces to do the 
 mapping between the XML elements and Lucene fields. 
 Informations :
 1- Copy xmlparser-conf.xml to the nutch/conf dir
 2- To index your custom XML file, you have to modify the 
 xmlparser-conf.xml. 
 This parser uses namespaces and XPATH to parse XML content
 The config file do the mapping between the XML noeds (using XPATH) and lucene 
 field. 
 Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 
 3- The xmlIndexerProperties encapsulate a set of fields associated to a 
 namespace. 
 If the namespace is found in the xml document, the fields represented by the 
 namespace will be indexed.
 Example : 
 xmlIndexerProperties type=filePerDocument namespace= 
 http://purl.org/dc/elements/1.1/;
   field name=dctitle xpath=//dc:title type=Text boost= 1.4 / 
   field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / 
 /xmlIndexerProperties
 4- It is possible to define a default namespace that will be applied when the 
 parser 
 didn't find any namespace in the document or when the namespace found in the 
 xml document doesn't match with the namespace defined in the 
 xmlIndexerProperties. 
 Example :
 xmlIndexerProperties type=filePerDocument namespace=default
   field name=xmlcontent xpath=//* type=Unstored boost=1.0 / 
 /xmlIndexerProperties

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-10-23 Thread Rida Benjelloun (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-185?page=all ]

Rida Benjelloun updated NUTCH-185:
--

Affects Version/s: 0.8.1
   0.8

 XMLParser is configurable xml parser plugin.
 

 Key: NUTCH-185
 URL: http://issues.apache.org/jira/browse/NUTCH-185
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher, indexer
Affects Versions: 0.7.2, 0.8.1, 0.8
 Environment: OS Independent
Reporter: Rida Benjelloun
 Attachments: parse-xml.zip, parse-xml.zip


 Xml parser  is configurable plugin. It use XPath and namespaces to do the 
 mapping between the XML elements and Lucene fields. 
 Informations :
 1- Copy xmlparser-conf.xml to the nutch/conf dir
 2- To index your custom XML file, you have to modify the 
 xmlparser-conf.xml. 
 This parser uses namespaces and XPATH to parse XML content
 The config file do the mapping between the XML noeds (using XPATH) and lucene 
 field. 
 Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 
 3- The xmlIndexerProperties encapsulate a set of fields associated to a 
 namespace. 
 If the namespace is found in the xml document, the fields represented by the 
 namespace will be indexed.
 Example : 
 xmlIndexerProperties type=filePerDocument namespace= 
 http://purl.org/dc/elements/1.1/;
   field name=dctitle xpath=//dc:title type=Text boost= 1.4 / 
   field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / 
 /xmlIndexerProperties
 4- It is possible to define a default namespace that will be applied when the 
 parser 
 didn't find any namespace in the document or when the namespace found in the 
 xml document doesn't match with the namespace defined in the 
 xmlIndexerProperties. 
 Example :
 xmlIndexerProperties type=filePerDocument namespace=default
   field name=xmlcontent xpath=//* type=Unstored boost=1.0 / 
 /xmlIndexerProperties

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [Proposal] New Lucene sub-project

2006-04-07 Thread Rida Benjelloun
Hi Jérôme,

I found your idea very interesting. I will be interested to contribute to
the Parse Plugins Framework. I have developed similar one using Lucene. The
project name is Lius.

If you are interested please let me know.



On 4/7/06, Jérôme Charron [EMAIL PROTECTED] wrote:

 Hi all,

 While chatting with Chris Mattmann, it seems to be evident to us that
 there
 is a need for a new sub-project within Lucene.

 For now, Lucene's sub-projects used in Nutch are :
 1. Lucene-java - The basis for search technology
 2. Hadoop - The distributed computing platform
 3. Nutch - The search engine that relies on Lucene and Hadoop.

 Since Nutch contains some value added pieces of code that focus on content
 analysis,
 we think it would be a good idea to split Nutch into a new sub-project
 based
 on content analysis
 manipulation. The components we have identified are :

 1. MimeType Repository
 2. Language Identifier
 3. Content Signature (MD5Signature / TextProfileSignature / ...)
 (4. Generic Meta Data Infrastructure)
 (5. Charset Detector)
 (6. Parse Plugins Framework)

 The idea is to expose these pieces of codes into a standalone lib, since
 we
 are convinced they could be usefull
 in many other projects than Nutch.
 The benefits will be to have some code more widely used / tested /
 contributed.
 If this proposal is accepted, we have a candidate name for this new
 project:
 Tika (comes from my son  ;-) )

 Any comment is welcome.

 Jérôme




Nutch plugin

2006-02-03 Thread Rida Benjelloun
Hi,
I would like to know what is the process used by nutch to evaluate a plugin
contribution and add it to nutch distribution.
I have create this issue :


[jira] Updated: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-02-01 Thread Rida Benjelloun (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-185?page=all ]

Rida Benjelloun updated NUTCH-185:
--

Summary: XMLParser is configurable xml parser plugin.   (was: XMLParser 
is configurable plugin. It use XPath and namespaces to do the mapping between 
the XML elements and Lucene fields.)
Description: 
Xml parser  is configurable plugin. It use XPath and namespaces to do the 
mapping between the XML elements and Lucene fields. 

Informations :

1- Copy xmlparser-conf.xml to the nutch/conf dir

2- To index your custom XML file, you have to modify the xmlparser-conf.xml. 
This parser uses namespaces and XPATH to parse XML content
The config file do the mapping between the XML noeds (using XPATH) and lucene 
field. 
Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 

3- The xmlIndexerProperties encapsulate a set of fields associated to a 
namespace. 
If the namespace is found in the xml document, the fields represented by the 
namespace will be indexed.
Example : 
xmlIndexerProperties type=filePerDocument namespace= 
http://purl.org/dc/elements/1.1/;
  field name=dctitle xpath=//dc:title type=Text boost= 1.4 / 
  field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / 
/xmlIndexerProperties


4- It is possible to define a default namespace that will be applied when the 
parser 
didn't find any namespace in the document or when the namespace found in the 
xml document doesn't match with the namespace defined in the 
xmlIndexerProperties. 
Example :
xmlIndexerProperties type=filePerDocument namespace=default
  field name=xmlcontent xpath=//* type=Unstored boost=1.0 / 
/xmlIndexerProperties


  was:
XMLParser is configurable plugin. It use XPath and namespaces to do the mapping 
between the XML elements and Lucene fields. 

Informations :

1- Copy xmlparser-conf.xml to the nutch/conf dir

2- To index your custom XML file, you have to modify the xmlparser-conf.xml. 
This parser uses namespaces and XPATH to parse XML content
The config file do the mapping between the XML noeds (using XPATH) and lucene 
field. 
Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 

3- The xmlIndexerProperties encapsulate a set of fields associated to a 
namespace. 
If the namespace is found in the xml document, the fields represented by the 
namespace will be indexed.
Example : 
xmlIndexerProperties type=filePerDocument namespace= 
http://purl.org/dc/elements/1.1/;
  field name=dctitle xpath=//dc:title type=Text boost= 1.4 / 
  field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / 
/xmlIndexerProperties


4- It is possible to define a default namespace that will be applied when the 
parser 
didn't find any namespace in the document or when the namespace found in the 
xml document doesn't match with the namespace defined in the 
xmlIndexerProperties. 
Example :
xmlIndexerProperties type=filePerDocument namespace=default
  field name=xmlcontent xpath=//* type=Unstored boost=1.0 / 
/xmlIndexerProperties



 XMLParser is configurable xml parser plugin. 
 -

  Key: NUTCH-185
  URL: http://issues.apache.org/jira/browse/NUTCH-185
  Project: Nutch
 Type: New Feature
   Components: fetcher, indexer
 Versions: 0.7.2-dev
  Environment: OS Independent
 Reporter: Rida Benjelloun
  Attachments: parse-xml.zip

 Xml parser  is configurable plugin. It use XPath and namespaces to do the 
 mapping between the XML elements and Lucene fields. 
 Informations :
 1- Copy xmlparser-conf.xml to the nutch/conf dir
 2- To index your custom XML file, you have to modify the 
 xmlparser-conf.xml. 
 This parser uses namespaces and XPATH to parse XML content
 The config file do the mapping between the XML noeds (using XPATH) and lucene 
 field. 
 Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 
 3- The xmlIndexerProperties encapsulate a set of fields associated to a 
 namespace. 
 If the namespace is found in the xml document, the fields represented by the 
 namespace will be indexed.
 Example : 
 xmlIndexerProperties type=filePerDocument namespace= 
 http://purl.org/dc/elements/1.1/;
   field name=dctitle xpath=//dc:title type=Text boost= 1.4 / 
   field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / 
 /xmlIndexerProperties
 4- It is possible to define a default namespace that will be applied when the 
 parser 
 didn't find any namespace in the document or when the namespace found in the 
 xml document doesn't match with the namespace defined in the 
 xmlIndexerProperties. 
 Example :
 xmlIndexerProperties type=filePerDocument namespace=default
   field name=xmlcontent xpath=//* type=Unstored boost=1.0 / 
 /xmlIndexerProperties

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see

Re: need volunteer to develop search for apache.org

2006-01-27 Thread Rida Benjelloun
Hi Doug,

I will be interested by this development. I have a lot of experience with
lucene.
Best regards


On 1/27/06, Fuad Efendi [EMAIL PROTECTED] wrote:

 Hope to join!
 +1


 -Original Message-
 From: Doug Cutting [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 25, 2006 4:24 PM
 To: nutch-dev@lucene.apache.org
 Subject: need volunteer to develop search for apache.org


 Would someone volunteer to develop Nutch-based site-search engine for
 all apache.org domains?  We now have a Solaris zone to host this.

 Thanks,

 Doug





--

Rida Benjelloun
Président directeur général
DocuLibre inc.
Téléphone : (418) 262-3222
Site Web : http://www.doculibre.com
Courriel : [EMAIL PROTECTED]



Re: [jira] Commented: (NUTCH-185) XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields.

2006-01-27 Thread Rida Benjelloun
Hi Philippe,

Thanks, for your comments. I have already add multi-values for a field  in
lucene. I will try it with nutch plugin.

Best regards.




On 1/26/06, Philippe EUGENE (JIRA) [EMAIL PROTECTED] wrote:

[
 http://issues.apache.org/jira/browse/NUTCH-185?page=comments#action_12364087]

 Philippe EUGENE commented on NUTCH-185:
 ---

 Great Plugin. Thanks !
 I succesfull test this plugin on a 0.7.1 version of nutch.
 I have just a problem with somes structures like this :
 authors
 authorauthor1/author
 authorauthor2/author
 authorauthor3/author
 authorr

 In my Lucene Index i just see the author3 value for this field.
 I'm not sure that the problem is on the plugin.
 I don't know if it's possible to have multi-values for a field on nutch
 0.7.1

  XMLParser is configurable plugin. It use XPath and namespaces to do the
 mapping between the XML elements and Lucene fields.
 
 ---
 
   Key: NUTCH-185
   URL: http://issues.apache.org/jira/browse/NUTCH-185
   Project: Nutch
  Type: New Feature
Components: fetcher, indexer
  Versions: 0.7.2-dev
   Environment: OS Independent
  Reporter: Rida Benjelloun
   Attachments: parse-xml.zip
 
  XMLParser is configurable plugin. It use XPath and namespaces to do the
 mapping between the XML elements and Lucene fields.
  Informations :
  1- Copy xmlparser-conf.xml to the nutch/conf dir
  2- To index your custom XML file, you have to modify the 
 xmlparser-conf.xml.
  This parser uses namespaces and XPATH to parse XML content
  The config file do the mapping between the XML noeds (using XPATH) and
 lucene field.
  Example : field name=dctitle xpath=//dc:title type=Text boost=
 1.4 /
  3- The xmlIndexerProperties encapsulate a set of fields associated to a
 namespace.
  If the namespace is found in the xml document, the fields represented by
 the namespace will be indexed.
  Example :
  xmlIndexerProperties type=filePerDocument namespace=
 http://purl.org/dc/elements/1.1/;
field name=dctitle xpath=//dc:title type=Text boost= 1.4 /
field name=dccreator xpath=//dc:creator type=keyword boost=
 1.0 /
  /xmlIndexerProperties
  4- It is possible to define a default namespace that will be applied
 when the parser
  didn't find any namespace in the document or when the namespace found in
 the xml document doesn't match with the namespace defined in the
 xmlIndexerProperties.
  Example :
  xmlIndexerProperties type=filePerDocument namespace=default
field name=xmlcontent xpath=//* type=Unstored boost=1.0 /
  /xmlIndexerProperties

 --
 This message is automatically generated by JIRA.
 -
 If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
 -
 For more information on JIRA, see:
   http://www.atlassian.com/software/jira




--

Rida Benjelloun
Président directeur général
DocuLibre inc.
Téléphone : (418) 262-3222
Site Web : http://www.doculibre.com
Courriel : [EMAIL PROTECTED]



[jira] Created: (NUTCH-185) XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields.

2006-01-24 Thread Rida Benjelloun (JIRA)
XMLParser is configurable plugin. It use XPath and namespaces to do the mapping 
between the XML elements and Lucene fields. 


 Key: NUTCH-185
 URL: http://issues.apache.org/jira/browse/NUTCH-185
 Project: Nutch
Type: New Feature
  Components: fetcher, indexer  
Versions: 0.7.2-dev
 Environment: OS Independent
Reporter: Rida Benjelloun


XMLParser is configurable plugin. It use XPath and namespaces to do the mapping 
between the XML elements and Lucene fields. 

Informations :

1- Copy xmlparser-conf.xml to the nutch/conf dir

2- To index your custom XML file, you have to modify the xmlparser-conf.xml. 
This parser uses namespaces and XPATH to parse XML content
The config file do the mapping between the XML noeds (using XPATH) and lucene 
field. 
Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 

3- The xmlIndexerProperties encapsulate a set of fields associated to a 
namespace. 
If the namespace is found in the xml document, the fields represented by the 
namespace will be indexed.
Example : 
xmlIndexerProperties type=filePerDocument namespace= 
http://purl.org/dc/elements/1.1/;
  field name=dctitle xpath=//dc:title type=Text boost= 1.4 / 
  field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / 
/xmlIndexerProperties


4- It is possible to define a default namespace that will be applied when the 
parser 
didn't find any namespace in the document or when the namespace found in the 
xml document doesn't match with the namespace defined in the 
xmlIndexerProperties. 
Example :
xmlIndexerProperties type=filePerDocument namespace=default
  field name=xmlcontent xpath=//* type=Unstored boost=1.0 / 
/xmlIndexerProperties


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



xml-parser plugin contribution

2006-01-23 Thread Rida Benjelloun
Hi,
I have developed an xml parser plugin. I have test it with nutch 0.7.2.
The parser use namespaces and xpath to do the mapping between XML nodes and
lucene fields.
I'm trying to send the source of the plugin in a zip file but my message is
always rejected (it is considered as a spam).
How can I send the source code ?
Best regards.


Class MultiProperties

2006-01-17 Thread Rida Benjelloun
Hi all,
I'm using nutch 0.7.1 jar, and I'm note able to find the class *MultiProperties
*in the package* org.apache.nutch.protocol.httpclient.*
When I look to the javadoc of the 0.7.1 version this class exist.
Could you please help me ?
Best regards.


OpenOffice and Excel parsers

2006-01-10 Thread Rida Benjelloun
Hi,

Is someone working on OpenOffice and Excel parsers ? because I have already
developed them in Lius (http://sourceforge.net/projects/lius) and I whant to
adapt them for nutch.

I have checked the SVN and I didn't find OO and Excel parser.

Best regards

Rida.