Re: Data Import handler and join select

2014-08-08 Thread Alejandro Marqués Rodríguez
First of all thank you very much for the answer, James. It is very complete
and it gives us several alternatives :)

I think we will try first the cache approach, as, after solving this
problem https://issues.apache.org/jira/browse/SOLR-5954 the performance has
been improved, so along with the cache solution we may achieve the expected
performance.

We've also tried modifying the transformers and we've got it working the
way we were looking for, though the solutions you propose seem to be much
cleaner.

Regarding indexing through solrj it was our first idea, the problem is when
we started the project, the DIH seemed to fit our needs perfectly, until we
tried with real data and realized about the performance issues, so, now
maybe it's a bit late for us trying to change everything :( If we have no
other option we will go that way but we need to try less drastic solutions
first.

Thanks!


2014-08-07 18:11 GMT+02:00 Dyer, James james.d...@ingramcontent.com:

 Alejandro,

 You can use a sub-entity with a cache using DIH.  This will solve the
 n+1-select problem and make it run quickly.  Unfortunately, the only
 built-in cache implementation is in-memory so it doesn't scale.  There is a
 fast, disk-backed cache using bdb-je, which I use in production.  See
 https://issues.apache.org/jira/browse/SOLR-2613 .  You will need to build
 this youself and include it on the classpath, and obtain a copy of bdb-je
 from Oracle.  While bdb-je is open source, its license is incompatible with
 ASL so this will never officially be part of Solr.

 Once you have a disk-backed cache, you can specify it on the child entity
 like this:
 entity name=parent query=select id, ... from parent table
 entity
 name=child
 query=select foreignKey, ... from child_table
 cacheKey=foreignKey
 cacheLookup=parent.id
 processor=SqlEntityProcessor
 transformer=...
 cacheImpl=BerkleyBackedCache
 /
 /entity

 If you don't want to go down this path, you can achieve this all with one
 query, if you include and ORDER BY to sort by whatever field is used as
 Solr's uniqueKey, and add a dummy row at the end with a UNION:

 SELECT p.uniqueKey, ..., 'A' as lastInd from PRODUCTS p
 INNER JOIN DESCRIPTIONS d ON p.uniqueKey = d.productKey
 UNION SELECT 0 as uniqueKey, ... , 'B' as lastInd from dual
 ORDER BY uniqueKey, lastInd

 Then your transformer would need to keep the lastUniqueKey in an
 instance variable and keep a running map of everything its seen for that
 key.  When the key changes, or if on the last row, send that map as the
 document.  Otherwise, the transformer returns null.  This will collect data
 from each row seen onto one document.

 Keep in mind also, that in a lot of cases like this, it might just be
 easiest to write a program that uses solrj to send your documents rather
 than trying to make DIH's features fit your use-case.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Alejandro Marqués Rodríguez [mailto:
 amarq...@paradigmatecnologico.com]
 Sent: Thursday, August 07, 2014 1:43 AM
 To: solr-user@lucene.apache.org
 Subject: Data Import handler and join select

 Hi,

 I have one problem while indexing with data import hadler while doing a
 join select. I have two tables, one with products and another one with
 descriptions for each product in several languages.

 So it would be:

 Products: ID, NAME, BRAND, PRICE, ...
 Descriptions: ID, LANGUAGE, DESCRIPTION

 I would like to have every product indexed as a document with a multivalued
 field language which contains every language that has an associated
 description and several dinamic fields description_ one for each
 language.

 So it would be for example:

 Id: 1
 Name: Product
 Brand: Brand
 Price: 10
 Languages: [es,en]
 Description_es: Descripción en español
 Description_en: English description

 Our first approach was using sub-entities for the data import handler and
 after implementing some transformers we had everything indexed as we
 wanted. The sub-entity process added the descriptions for each language to
 the solr document and then indexed them.

 The problem was performance. I've read that using sub-entities affected
 performance greatly, so we changed our process in order to use a join
 instead.

 Performance was greatly improved this way but now we have a problem. Each
 time a row is processed a solr document is generated and indexed into solr,
 but the data is not added to any previous data, but it replaces it.

 If we had the previous example the query resulting from the join would be:

 Id - Name - Brand - Price - Language - Description
 1 - Product - Brand - 10 - es - Descripción en español
 1 - Product - Brand - 10 - en - English description

 So when indexing as both have the same id the only information I get is the
 second row.

 Is there any way for data import handler to manage this and allow

Data Import handler and join select

2014-08-07 Thread Alejandro Marqués Rodríguez
Hi,

I have one problem while indexing with data import hadler while doing a
join select. I have two tables, one with products and another one with
descriptions for each product in several languages.

So it would be:

Products: ID, NAME, BRAND, PRICE, ...
Descriptions: ID, LANGUAGE, DESCRIPTION

I would like to have every product indexed as a document with a multivalued
field language which contains every language that has an associated
description and several dinamic fields description_ one for each language.

So it would be for example:

Id: 1
Name: Product
Brand: Brand
Price: 10
Languages: [es,en]
Description_es: Descripción en español
Description_en: English description

Our first approach was using sub-entities for the data import handler and
after implementing some transformers we had everything indexed as we
wanted. The sub-entity process added the descriptions for each language to
the solr document and then indexed them.

The problem was performance. I've read that using sub-entities affected
performance greatly, so we changed our process in order to use a join
instead.

Performance was greatly improved this way but now we have a problem. Each
time a row is processed a solr document is generated and indexed into solr,
but the data is not added to any previous data, but it replaces it.

If we had the previous example the query resulting from the join would be:

Id - Name - Brand - Price - Language - Description
1 - Product - Brand - 10 - es - Descripción en español
1 - Product - Brand - 10 - en - English description

So when indexing as both have the same id the only information I get is the
second row.

Is there any way for data import handler to manage this and allow the
documents to be indexed updating any previous data?

Thanks in advance



-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Volatile spellcheck index

2014-02-05 Thread Alejandro Marqués Rodríguez
Hi,

I'm having a problem with the spell check index building. I've configured
the spell checker component to have the index built on optimize.

*  !-- Spell Check http://wiki.apache.org/solr/SpellCheckComponent
http://wiki.apache.org/solr/SpellCheckComponent --*
*  searchComponent name=spellcheck class=solr.SpellCheckComponent*
*  str name=queryAnalyzerFieldTypespell/str*

*  lst name=spellchecker*
*  str name=namespellchecker/str*
*  str name=fieldspell/str*
*  str name=accuracy0.7/str*
*  str name=buildOnOptimizetrue/str*
*  /lst*
*  /searchComponent*

*  !-- A request handler for demonstrating the spellcheck component.
http://wiki.apache.org/solr/SpellCheckComponent
http://wiki.apache.org/solr/SpellCheckComponent for details --*
*  requestHandler name=/spell class=solr.SearchHandler*
*lst name=defaults  *
*  str name=spellcheck.dictionaryspellchecker/str*
*  str name=spellcheckon/str*
*  str name=spellcheck.onlyMorePopularfalse/str*
*  str name=spellcheck.extendedResultsfalse/str*
*  str name=spellcheck.count1/str*
*/lst*
*arr name=last-components*
*  strspellcheck/str*
*/arr*
*  /requestHandler*

After the index process I launch an optimize request and the spellcheck
index is generated and everything is working fine. However, if I restart
Solr the spell check is not working anymore until I execute another
optimize request.

So, is this the expected way of working? Is the spell check index deleted
after every server restart? Is there any way to make it persistent?

And just one more question, I remember in previous Solr versions the
spellcheck had even its own folder under the data folder, so, for example I
could see if the spell check index had been generated just listing the
files under that folder. Does that folder still exist? Is there any way of
knowing if the spell check index has been generated without executing a
query that is supposed to return a correction?

Thanks in advance




-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Re: Volatile spellcheck index

2014-02-05 Thread Alejandro Marqués Rodríguez
Thanks for the answer James. My fault not specifying the Solr version, we
are working with solr 4.5.

Anyway, thank you very much for pointing the change to
DirectSolrSpellChecker. I hadn't even realized that change, and I think I
wasn't using it, as the line str
name=classnamesolr.DirectSolrSpellChecker/str was missing in my
configuration. Once I changed it, I think everything is working fine even
after server restart.

Thanks again James, you've saved me from some serious headache ;)



2014-02-05 Dyer, James james.d...@ingramcontent.com:

 Alejandro,

 Assuming you're using Solr 3.x, under:

 searchComponent name=spellcheck class=solr.SpellCheckComponent
  lst name=spellchecker
  ...
  /lst
 /searchComponent

 ...you can add:

 str name=spellcheckIndexDir./spellchecker/str

 ...then the spell check index will be created on-disk and not in memory.

 But in Solr 4.0, the default spellcheck implementation changed to
 org.apache.solr.spelling.DirectSolrSpellChecker, which does not create a
 separate index for for spellchecking, build does nothing, and you need
 not worry at all about these things.  The wiki still says experimental
 here but that is woefully out-of-date.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Alejandro Marqués Rodríguez [mailto:
 amarq...@paradigmatecnologico.com]
 Sent: Wednesday, February 05, 2014 3:41 AM
 To: solr-user@lucene.apache.org
 Subject: Volatile spellcheck index

 Hi,

 I'm having a problem with the spell check index building. I've configured
 the spell checker component to have the index built on optimize.

 *  !-- Spell Check http://wiki.apache.org/solr/SpellCheckComponent
 http://wiki.apache.org/solr/SpellCheckComponent --*
 *  searchComponent name=spellcheck class=solr.SpellCheckComponent*
 *  str name=queryAnalyzerFieldTypespell/str*

 *  lst name=spellchecker*
 *  str name=namespellchecker/str*
 *  str name=fieldspell/str*
 *  str name=accuracy0.7/str*
 *  str name=buildOnOptimizetrue/str*
 *  /lst*
 *  /searchComponent*

 *  !-- A request handler for demonstrating the spellcheck component.
 http://wiki.apache.org/solr/SpellCheckComponent
 http://wiki.apache.org/solr/SpellCheckComponent for details --*
 *  requestHandler name=/spell class=solr.SearchHandler*
 *lst name=defaults  *
 *  str name=spellcheck.dictionaryspellchecker/str*
 *  str name=spellcheckon/str*
 *  str name=spellcheck.onlyMorePopularfalse/str*
 *  str name=spellcheck.extendedResultsfalse/str*
 *  str name=spellcheck.count1/str*
 */lst*
 *arr name=last-components*
 *  strspellcheck/str*
 */arr*
 *  /requestHandler*

 After the index process I launch an optimize request and the spellcheck
 index is generated and everything is working fine. However, if I restart
 Solr the spell check is not working anymore until I execute another
 optimize request.

 So, is this the expected way of working? Is the spell check index deleted
 after every server restart? Is there any way to make it persistent?

 And just one more question, I remember in previous Solr versions the
 spellcheck had even its own folder under the data folder, so, for example I
 could see if the spell check index had been generated just listing the
 files under that folder. Does that folder still exist? Is there any way of
 knowing if the spell check index has been generated without executing a
 query that is supposed to return a correction?

 Thanks in advance




 --
 Alejandro Marqués Rodríguez

 Paradigma Tecnológico
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42




-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Re: Is there any limit how many documents can be indexed by apache solr

2013-11-26 Thread Alejandro Marqués Rodríguez
Hi,

In lucene you are supossed to be able to index up to 274 billion documents
( http://lucene.apache.org/core/3_0_3/fileformats.html#Limitations ), so in
Solr should be something like that. Anyway the maximum number is quite
bigger than those 11.000 ;)

Could it be that you are reusing IDs so the new documents overwrite the old
ones?


2013/11/26 Kamal Palei palei.ka...@gmail.com

 Dear All
 I am using Apache solr 3.6.2 with Drupal 7.
 Users keeps adding their profiles (resumes) and with cron task from Drupal,
 documents get indexed.

 Recently I observed, after indexing around 11,000 documents, further
 documents are not getting indexed.

 Is there any configuration for max documents those can be indexed.

 Kindly help.

 Thanks
 kamal




-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Best implementation for multi-price store?

2013-11-21 Thread Alejandro Marqués Rodríguez
Hi,

I've been recently ask to implement an application to search products from
several stores, each store having different prices and stock for the same
product.

So I have products that have the usual fields (name, description, brand,
etc) and also number of units and price for each store. I must be able to
filter for a given store and order by stock or price for that store. The
application should also allow incresing the number of stores, fields
depending of store and number of products without much work.

The numbers for the application are more or less 100 stores and 7M products.

I've been thinking of some ways of defining the index structure but I don't
know wich one is better as I think each one has it's pros and cons.


   1. *Each product-store as a document:* Denormalizing the information so
   for every product and store I have a different document. Pros are that I
   can filter and order without problems and that adding a new store-depending
   field is very easy. Cons are that the index goes from 7M documents to 700M
   and that most of the info is redundant as most of the fields are repeated
   among stores.
   2. *Each field-store as a field:* For example for price I would have
   store1_price, store2_price,  Pros are that the index stays at 7M
   documents, and I can still filter and sort by those fields. Cons are that I
   have to add some logic so if I filter by one store I order for the
   associated price field, and that number of fields increases as number of
   store-depending fields x number of stores. I don't know if having more
   fields affects performance, but adding new store-depending fields will
   increase the number of fields even more
   3. *Join:* First time I read about solr joins thought it was the way to
   go in this case, but after reading a bit more and doing some tests I'm not
   so sure about it... Maybe I've done it wrong but I think it also
   denormalizes the info (So I will also havee 700M documents) and besides I
   can't order or filter by store fields.


I must say my preferred option is number 2, so I don't duplicate
information, I keep a relatively small number of documents and I can filter
and sort by the store fields. However, my main concern here is I don't know
if having too many fields in a document will be harmful to performance.

Which one do you think is the best approach for this application? Is there
a better approach that I have missed?

Thanks in advance



-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Re: Best implementation for multi-price store?

2013-11-21 Thread Alejandro Marqués Rodríguez
Hi Robert,

That was the idea, dynamic fields, so, as you said, it is easier to sort
and filter. Besides, having dynamic fields it would be easier to add new
stores, as I wouldn't have to modify the schema :)

Thanks for the answer!


2013/11/21 Petersen, Robert robert.peter...@mail.rakuten.com

 Hi,

 I'd go with (2) also but using dynamic fields so you don't have to define
 all the storeX_price fields in your schema but rather just one *_price
 field.  Then when you filter on store:store1 you'd know to sort with
 store1_price and so forth for units.  That should be pretty straightforward.

 Hope that helps,
 Robi

 -Original Message-
 From: Alejandro Marqués Rodríguez [mailto:
 amarq...@paradigmatecnologico.com]
 Sent: Thursday, November 21, 2013 1:36 AM
 To: solr-user@lucene.apache.org
 Subject: Best implementation for multi-price store?

 Hi,

 I've been recently ask to implement an application to search products from
 several stores, each store having different prices and stock for the same
 product.

 So I have products that have the usual fields (name, description, brand,
 etc) and also number of units and price for each store. I must be able to
 filter for a given store and order by stock or price for that store. The
 application should also allow incresing the number of stores, fields
 depending of store and number of products without much work.

 The numbers for the application are more or less 100 stores and 7M
 products.

 I've been thinking of some ways of defining the index structure but I
 don't know wich one is better as I think each one has it's pros and cons.


1. *Each product-store as a document:* Denormalizing the information so
for every product and store I have a different document. Pros are that I
can filter and order without problems and that adding a new
 store-depending
field is very easy. Cons are that the index goes from 7M documents to
 700M
and that most of the info is redundant as most of the fields are
 repeated
among stores.
2. *Each field-store as a field:* For example for price I would have
store1_price, store2_price,  Pros are that the index stays at 7M
documents, and I can still filter and sort by those fields. Cons are
 that I
have to add some logic so if I filter by one store I order for the
associated price field, and that number of fields increases as number of
store-depending fields x number of stores. I don't know if having more
fields affects performance, but adding new store-depending fields will
increase the number of fields even more
3. *Join:* First time I read about solr joins thought it was the way to
go in this case, but after reading a bit more and doing some tests I'm
 not
so sure about it... Maybe I've done it wrong but I think it also
denormalizes the info (So I will also havee 700M documents) and besides
 I
can't order or filter by store fields.


 I must say my preferred option is number 2, so I don't duplicate
 information, I keep a relatively small number of documents and I can filter
 and sort by the store fields. However, my main concern here is I don't know
 if having too many fields in a document will be harmful to performance.

 Which one do you think is the best approach for this application? Is there
 a better approach that I have missed?

 Thanks in advance



 --
 Alejandro Marqués Rodríguez

 Paradigma Tecnológico
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42




-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


SorlCloud recovery issue while search stress test

2013-11-12 Thread Alejandro Marqués Rodríguez
Hi,

We've been experiencing some problems during search stress tests and we
don't even have a clue on why is this happening.

We have the following:
- 3 servers
- Websphere 7
- Zookeeper 3.4.5 on each server
- Solr 4.5.0 on each server
- 1 shard (so it is one leader and 2 replicas)
- The index contains 7M documents (About 2GB)

We've run several stress tests with JMeter with 100-500 concurrent threads.
Depending on how many threads, we have different scenarios, but appart from
times or wether the system fully recovers or not, we have the next steps:


   1. The solrs begin responding queries, with stable number of threads for
   each solr (Less than 10)
   2. Once the test has been running for several minutes we kill one of the
   solrs (Most of the times the one being the leader)
   3. The remaining solrs respond to the queries increasing slightly the
   number of threads used.
   4. After a few minutes we restart the killed solr again (And here is
   where our problem starts)
   5. Once it starts it begins increasing the number of threads used (Up to
   100 or above) and the worst thing is that even the other two solrs start
   responding slowly (Or not responding at all). Then, depending on the number
   of concurrent queries, if there are few in more or less 3 minutes
   everything goes back to normal (thought almost no queries are attended
   during that period) or, if there are more than 200 concurrent queries the
   restarted server increases so much its used threads that it crashes.

During the minutes that the three solrs are not responding there are no
logs, and after making a thread dump we've seen a lot of stalled threads
with sun.misc.Unsafe.park traces.

I don't understand this behaviour at all, not only it works better with two
solrs than restarting the third but this restart affects the behaviour of
the two remaining solrs...

Anybody has any clue about this?

Thanks in advance



-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


SolrCloud full index replication on leader failure

2013-10-30 Thread Alejandro Marqués Rodríguez
Hi,

I have a problem with SolrCloud in an specific test case and I wanted to
know if it is the way it should work or if is there any way to avoid this...

I have the next scenario:

- Three machines
- Each one with one zookeeper and one solr 4.1.0
- Each Solr stores 7 Million documents and the index is 2GB

The test consist on sending queries to solr (100 concurrent queries
continously) and then forcing the leader failure by shutting down both
zookeeper and solr.

When we shut down any solr that is not the leader there are no problems,
the other two respond to the queries without problems. However if we shut
down the leader the next steps occur:

- Both Solrs continue responding to the queries until the leader election
starts
- One of them is elected as leader and the other one stops responding
queries (I've read it goes to recovery mode until its index is synchronized
with the leader's one)
- Then, even though both indexes are the same (They were synchronized
before the leader failure), the whole index is replicated.
- During the time while the 2GB are replicated from leader to the remaining
server, the server recovering is not responding to queries, therefore the
leader must attend to the whole amount of queries and finally it crashes
due to having to many queries to answer (Aside of replicating its index)

My question here is... Is it normal that the whole index replicates in a
leader change even though the leader and the other solr indexes should be
the same? Is there any way to avoid it? Maybe I have some configuration
wrong? Should changing Solr to 4.5.X avoid this operative?

Aside from this problem everything seems to work fine, but that point of
failure is too risky for us

Thanks in advance


-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Re: Query help

2010-07-16 Thread Alejandro Marqués Rodríguez
I can't see a way of retrieving five results from one type and five from
another in a single query. The only way I can think about that would have a
similar behaviour would be:

?q=ContentType:(News+OR+Analysis)sort=DatePublished+descstart=0rows=10

This way you'll have the first 10 results being News or Analysis, though it
could be 7 News and 3 Analysis or even 10 and 0...

If you need Solr to return 5 results from each type, I think the only way to
improve the search speed would be, instead of using just one query, making
two parallel queries.

Regards


2010/7/15 Rupert Bates rupert.ba...@guardian.co.uk

 Sorry, my mistake, the example should have been as follows:

 ?q=ContentType:Newssort=DatePublished+descstart=0rows=5
 ?q=ContentType:Analysissort=DatePublished+descstart=0rows=5

 Rupert

 On 15 July 2010 13:02, kenf_nc ken.fos...@realestate.com wrote:
 
  Your example though doesn't show different ContentType, it shows a
 different
  sort order. That would be difficult to achieve in one call. Sounds like
 your
  best bet is asynchronous (multi-threaded) calls if your architecture will
  allow for it.
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Query-help-tp969075p969334.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



 --
 Rupert Bates

 Software Development Manager
 Guardian News and Media

 Tel: 020 3353 3315
 rupert.ba...@guardian.co.uk
 Please consider the environment before printing this email.
 --
 Visit guardian.co.uk - newspaper website of the year
 www.guardian.co.uk  www.observer.co.uk

 To save up to 33% when you subscribe to the Guardian and the Observer
 visit http://www.guardian.co.uk/subscriber

 The Guardian Public Services Awards 2010, in partnership with
 Hays Specialist Recruitment, recognise and reward outstanding
 performance from public, private and voluntary sector teams.

 To find out more and to nominate a deserving team or individual,
 visit http://guardian.co.uk/publicservicesawards

 Entries close 16 July.

 -

 This e-mail and all attachments are confidential and may also
 be privileged. If you are not the named recipient, please notify
 the sender and delete the e-mail and all attachments immediately.
 Do not disclose the contents to another person. You may not use
 the information for any purpose, or store, or copy, it in any way.

 Guardian News  Media Limited is not liable for any computer
 viruses or other material transmitted with or as part of this
 e-mail. You should employ virus checking software.

 Guardian News  Media Limited
 A member of Guardian Media Group PLC
 Registered Office
 Number 1 Scott Place, Manchester M3 3GG
 Registered in England Number 908396




-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Re: partial word searching

2010-04-21 Thread Alejandro Marqués Rodríguez
Hi,

You can use wildcards but I suppose it would only work with one word (though
maybe if you use tokenization you could use something like field:sun* AND
field:hot*)

You could also use N-grams to achieve partial searchs. For example, if you
use 3-grams for hotel you'll index hot, ote and tel, so you could
find hotel searching for any of those three strings.

There's a N-gram filter you could apply though I don't know how it works
when retrieving N-grams from a more than one word expression:
filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=15/

Again I suppose if you use a tokenizer first you could get the next 3-grams
hot ote tel sun unw nwa way and therefore searching for
field:sun AND field:hot will retrieve the sunway hotel document.

Regards



2010/4/21 Chamnap Chhorn chamnapchh...@gmail.com

 Hi everyone,


 I'm quite new to solr 1.4. I have a requirement to be able to search
 partial
 words (sun hot = Sunway Hotel) and to search full word(sunway hotel
 = Sunway Hotel). Currently, I could be able to search only full word.
 Anyone has any suggestions?

 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/




-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Re: Stemming - disable at query time - reg.

2010-04-19 Thread Alejandro Marqués Rodríguez
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42