Re: Start With and contain search

2014-12-02 Thread melb
Thanks, I think the  NgramFitlerFactory is the good filter, I will try it
today
but what if I want to search for query :* dom host *and get the result:
*domhost*



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Start-With-and-contain-search-tp4171854p4172031.html
Sent from the Solr - User mailing list archive at Nabble.com.


Get list of collection

2014-12-02 Thread Ankit Jain
Hi All,

I have a requirement to get the list of *collection* available in Solr. we
are using solrj library.

I am able to fetch the list of cores but not getting ways to fetch the list
of collections.

Below is the sample example, that i am using to fetch the cores:

CoreAdminRequest request = new CoreAdminRequest();
request.setAction(CoreAdminAction.STATUS);
CoreAdminResponse cores = request.process(server);

// List of the cores
ListString coreList = new ArrayListString();
for (int i = 0; i  cores.getCoreStatus().size(); i++) {
coreList.add(cores.getCoreStatus().getName(i));
}

Please help.

-- 
Thanks,
Ankit Jain


Slow queries

2014-12-02 Thread melb
Hi,

I have a solr collection with 16 millions documents and growing daily with
1 documents
recently it is becoming slow to answer my request ( several seconds)
specially when I use multi-words query
I am running solr on a machine with 32G RAM but heavy used one

What are my options to optimize the collection and speed up querying it
is it normal with this volume of data? is sharding is a good solution?

regards,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slow queries

2014-12-02 Thread Siegfried Goeschl
If you performance was fine but degraded over the time it might be 
easier to check / increase the memory to have better disk caching.


Cheers,

Siegfried Goeschl


On 02.12.14 09:27, melb wrote:

Hi,

I have a solr collection with 16 millions documents and growing daily with
1 documents
recently it is becoming slow to answer my request ( several seconds)
specially when I use multi-words query
I am running solr on a machine with 32G RAM but heavy used one

What are my options to optimize the collection and speed up querying it
is it normal with this volume of data? is sharding is a good solution?

regards,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Slow queries

2014-12-02 Thread melb
Yes  performance degraded over the time, I can raise the memory but I can't
do it every time and the volume will keep growing
Is it better to put the solr on dedicated machine?
Is there any thing else that can be done to the solr instance for example
deviding the collection?

rgds,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032p4172039.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Different update handlers for auto commit configuration

2014-12-02 Thread danny teichthal
Thanks for the clarification, I indeed mixed it with UpdateRequestHandler.

On Mon, Dec 1, 2014 at 11:24 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : I thought that the auto commit is per update handler because they are
 : configured within the update handler tag.

 updateHandler is not the same thing as a requestHandler that does
 updates.

 there can be many Update request handlers configured, but there is only
 ever one updateHandler/ in a SolrCore.



 -Hoss
 http://www.lucidworks.com/



Replication of a corrupt master index

2014-12-02 Thread Charra, Johannes

Hi,

If I have a master/slave setup and the master index gets corrupted, will the 
slaves realize they should not replicate from the master anymore, since the 
master does not have a newer index version?

I'm using Solr version 4.2.1.

Regards,
Johannes




Re: SOLR not starting after restart 2 node cloud setup

2014-12-02 Thread Doss
Dear Erick,

Thanks for your thoughts, it helped me a lot. In my instances no solr logs
are appended in to catalina.out.

Now I placed the log4j.properties file. Solr logs are captured in solr.log
file with the help of it I found the reason for the issue.

I am starting tomcat with the option -Dbootstrap_conf=true which made solr
to look for core configuration files in a wrong directory, after removing
this it started without any issues.

I also commented suggester component which made solr to load fast.

Thanks,
Doss.




On Thu, Nov 20, 2014 at 9:47 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Doss:

 Tomcat often puts things in catalina.out, you might check there,
 I've often seen logging information from Solr go there by
 default.

 Without having some idea what kinds of problems Solr is
 reporting when you see this situation, it's really hard to say.

 Some things I'd check first though, in order of what
 I _guess_ is most likely.

  There have been anecdotal reports (in fact, I'm trying
 to understand the why of it right now) of the suggester
 taking a long time to initialize, even if you don't use it!
 So if you're not using the suggest component, try
 commenting out those sections in solrconfig.xml for
 the cores in question. I like this explanation since it
 fits with your symptoms, but I don't like it since the
 index you are using isn't all that big. So it's something
 of a shot in the dark. I expect that the core will
 _eventually_ come up, but I've seen reports of 10-15
 minutes being required, far beyond my patience! That
 said, this would also explain why deleting the index
 works.

  OutOfMemory errors. You might be able to attach
 jConsole (part of the standard Java stuff) to the process
 and monitor the memory usage. If it's being pushed near
 the 5G limit that's the first thing I'd suspect.

  If you're using the default setups, then the Zookeeper
 timeout may be too low, I think the default (not sure about
 whether it's been changed in 4.9) is 15 seconds, 30-60
 is usually much better.

 Best,
 Erick


 On Thu, Nov 20, 2014 at 3:47 AM, Doss itsmed...@gmail.com wrote:
  Dear Erick,
 
  Forgive my ignorance.
 
  Please find some of the details you required.
 
  *have you looked at the solr logs?*
 
Sorry I haven't defined the log4j.properties file, so I don't have
 solr
  logs. Since it requires tomcat restart I am planning to do it in next
  restart.
 
  But found the following in tomcat log
 
  18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2]
  org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The
 web
  application [/mima] appears to have started a thread named
  [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to
  stop it. This is very likely to create a memory leak. Stack trace of
 thread:
   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
   sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
   sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
   org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
 
 
  *How big are the cores?*
 
  We have 16 cores, out of it only 5 are big ones. Total size of all 16
  cores is 10+ GB
 
  *How many docs in the cores when the problem happens?*
 
  1 core with 163 fields and 33,00,000 documents (Index size 2+ GB)
   4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5
 GB)
  remaining cores are 1,00,000 to 40,00,000 documents
 
  *How much memory are you allocating the JVM? *
 
  5GB for JVM, Total RAM available in the systems is 30 GB
 
  *can you restart Tomcat without a problem?*
 
  This problem is occurring in production, I never tried.
 
 
  Thanks,
  Doss.
 
 
  On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  You've really got to provide details for us to say much
  of anything. There are about a zillion things that it could be.
 
  In particular, have you looked at the solr logs? Are there
  any interesting things in them? How big are the cores?
  How much memory are you allocating the JVM? How
  many docs in the cores when the problem happens?
  Before the nodes stop responding, can you restart
  Tomcat without a problem?
 
  You might review:
  http://wiki.apache.org/solr/UsingMailingLists
 
  Best,
  Erick
 
 
  On Wed, Nov 19, 2014 at 1:04 AM, Doss itsmed...@gmail.com wrote:
   I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At
 times
   SOLR in Node 1 stops responding, to fix the issue I am restarting
 tomcat
  in
   Node 1, but SOLR not starting up, but if I remove the solr cores in
 both
   nodes and try restarting it starts working, and then I have to reindex
  the
   whole data again. We are using this setup in production because of
 this
   issue we 

Re: Getting the position of a word via Solr API

2014-12-02 Thread adfel70
Small update,
I have managed making the Term Vector to work and I am getting all the words
of the text field.

The problem is that it doesn't work with several words combined, I can't
find the offset of the needed expression starts...

Any ideas anyone?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-the-position-of-a-word-via-Solr-API-tp4171877p4172092.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Start With and contain search

2014-12-02 Thread Alexandre Rafalovitch
It's not clear what you actually mean with that space. Do you mean any
two words should try to match as if they were one? What's the
business-level description of what you are trying to do?

Also, you are not reinventing https://domainr.com/ , are you? If you
are, search around, I think they had some technical architecture
description somewhere in early articles.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 2 December 2014 at 03:15, melb melaggo...@gmail.com wrote:
 Thanks, I think the  NgramFitlerFactory is the good filter, I will try it
 today
 but what if I want to search for query :* dom host *and get the result:
 *domhost*



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Start-With-and-contain-search-tp4171854p4172031.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Start With and contain search

2014-12-02 Thread melb
Yes this is exactly what I am trying to do  but with less extended database
can I do it with solr?
rgds,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Start-With-and-contain-search-tp4171854p4172105.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Start With and contain search

2014-12-02 Thread Alexandre Rafalovitch
Well, if all you are doing is substring searches, then Solr could be
an overkill. But if you doing a search and then want to do faceting or
additional query, then Solr is a good bet.

And yes, it can do it, you just need to really understand your input
patterns and what you want to find with them.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 2 December 2014 at 09:20, melb melaggo...@gmail.com wrote:
 Yes this is exactly what I am trying to do  but with less extended database
 can I do it with solr?
 rgds,



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Start-With-and-contain-search-tp4171854p4172105.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Get list of collection

2014-12-02 Thread Erick Erickson
I think you want CloudSolrServer.getCollectionList()

Best,
Erick

On Tue, Dec 2, 2014 at 12:27 AM, Ankit Jain ankitjainc...@gmail.com wrote:
 Hi All,

 I have a requirement to get the list of *collection* available in Solr. we
 are using solrj library.

 I am able to fetch the list of cores but not getting ways to fetch the list
 of collections.

 Below is the sample example, that i am using to fetch the cores:

 CoreAdminRequest request = new CoreAdminRequest();
 request.setAction(CoreAdminAction.STATUS);
 CoreAdminResponse cores = request.process(server);

 // List of the cores
 ListString coreList = new ArrayListString();
 for (int i = 0; i  cores.getCoreStatus().size(); i++) {
 coreList.add(cores.getCoreStatus().getName(i));
 }

 Please help.

 --
 Thanks,
 Ankit Jain


Re: Slow queries

2014-12-02 Thread Erick Erickson
bq: Is it better to put the solr on dedicated machine?

Yes, absolutely. Solr _likes_ memory, and on a
machine with lots of other processes you'll keep
running into this problem.

FWIW, I've seen between 10M and 300M docs fit into
16G for the JVM. But see Uwe's excellent blog on MMapDirectory
and not over-allocating memory to the JVM here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Also see:
https://wiki.apache.org/solr/SolrPerformanceProblems
and
http://wiki.apache.org/solr/SolrPerformanceFactors

Best,
Erick

On Tue, Dec 2, 2014 at 1:02 AM, melb melaggo...@gmail.com wrote:
 Yes  performance degraded over the time, I can raise the memory but I can't
 do it every time and the volume will keep growing
 Is it better to put the solr on dedicated machine?
 Is there any thing else that can be done to the solr instance for example
 deviding the collection?

 rgds,



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032p4172039.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication of a corrupt master index

2014-12-02 Thread Erick Erickson
No. The master is the master and will always stay the master
unless you change it. This is one of the reasons I really like
to keep the original source around in case I every have this
problem.

Best,
Erick

On Tue, Dec 2, 2014 at 2:34 AM, Charra, Johannes
johannes.charrahorstm...@haufe-lexware.com wrote:

 Hi,

 If I have a master/slave setup and the master index gets corrupted, will the 
 slaves realize they should not replicate from the master anymore, since the 
 master does not have a newer index version?

 I'm using Solr version 4.2.1.

 Regards,
 Johannes




Re: SOLR not starting after restart 2 node cloud setup

2014-12-02 Thread Erick Erickson
Glad you found a solution!

Best,
Erick

On Tue, Dec 2, 2014 at 4:30 AM, Doss itsmed...@gmail.com wrote:
 Dear Erick,

 Thanks for your thoughts, it helped me a lot. In my instances no solr logs
 are appended in to catalina.out.

 Now I placed the log4j.properties file. Solr logs are captured in solr.log
 file with the help of it I found the reason for the issue.

 I am starting tomcat with the option -Dbootstrap_conf=true which made solr
 to look for core configuration files in a wrong directory, after removing
 this it started without any issues.

 I also commented suggester component which made solr to load fast.

 Thanks,
 Doss.




 On Thu, Nov 20, 2014 at 9:47 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Doss:

 Tomcat often puts things in catalina.out, you might check there,
 I've often seen logging information from Solr go there by
 default.

 Without having some idea what kinds of problems Solr is
 reporting when you see this situation, it's really hard to say.

 Some things I'd check first though, in order of what
 I _guess_ is most likely.

  There have been anecdotal reports (in fact, I'm trying
 to understand the why of it right now) of the suggester
 taking a long time to initialize, even if you don't use it!
 So if you're not using the suggest component, try
 commenting out those sections in solrconfig.xml for
 the cores in question. I like this explanation since it
 fits with your symptoms, but I don't like it since the
 index you are using isn't all that big. So it's something
 of a shot in the dark. I expect that the core will
 _eventually_ come up, but I've seen reports of 10-15
 minutes being required, far beyond my patience! That
 said, this would also explain why deleting the index
 works.

  OutOfMemory errors. You might be able to attach
 jConsole (part of the standard Java stuff) to the process
 and monitor the memory usage. If it's being pushed near
 the 5G limit that's the first thing I'd suspect.

  If you're using the default setups, then the Zookeeper
 timeout may be too low, I think the default (not sure about
 whether it's been changed in 4.9) is 15 seconds, 30-60
 is usually much better.

 Best,
 Erick


 On Thu, Nov 20, 2014 at 3:47 AM, Doss itsmed...@gmail.com wrote:
  Dear Erick,
 
  Forgive my ignorance.
 
  Please find some of the details you required.
 
  *have you looked at the solr logs?*
 
Sorry I haven't defined the log4j.properties file, so I don't have
 solr
  logs. Since it requires tomcat restart I am planning to do it in next
  restart.
 
  But found the following in tomcat log
 
  18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2]
  org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The
 web
  application [/mima] appears to have started a thread named
  [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to
  stop it. This is very likely to create a memory leak. Stack trace of
 thread:
   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
   sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
   sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
   org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
 
 
  *How big are the cores?*
 
  We have 16 cores, out of it only 5 are big ones. Total size of all 16
  cores is 10+ GB
 
  *How many docs in the cores when the problem happens?*
 
  1 core with 163 fields and 33,00,000 documents (Index size 2+ GB)
   4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5
 GB)
  remaining cores are 1,00,000 to 40,00,000 documents
 
  *How much memory are you allocating the JVM? *
 
  5GB for JVM, Total RAM available in the systems is 30 GB
 
  *can you restart Tomcat without a problem?*
 
  This problem is occurring in production, I never tried.
 
 
  Thanks,
  Doss.
 
 
  On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  You've really got to provide details for us to say much
  of anything. There are about a zillion things that it could be.
 
  In particular, have you looked at the solr logs? Are there
  any interesting things in them? How big are the cores?
  How much memory are you allocating the JVM? How
  many docs in the cores when the problem happens?
  Before the nodes stop responding, can you restart
  Tomcat without a problem?
 
  You might review:
  http://wiki.apache.org/solr/UsingMailingLists
 
  Best,
  Erick
 
 
  On Wed, Nov 19, 2014 at 1:04 AM, Doss itsmed...@gmail.com wrote:
   I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At
 times
   SOLR in Node 1 stops responding, to fix the issue I am restarting
 tomcat
  in
   Node 1, but SOLR not starting up, but if I remove the solr cores in
 both
   nodes and try restarting it starts working, and 

Re: Slow queries

2014-12-02 Thread Siegfried Goeschl
It might be a good idea to

* move SOLR to a dedicated box :-)
* load your SOLR server with 20.000.000 documents (the estimated number of 
documents after three years) and do performance testing  tuning

Afterwards you have some hard facts about hardware sizing and expected 
performance for the next three years :-)

Cheers,

Siegfried Goeschl

 On 02 Dec 2014, at 10:02, melb melaggo...@gmail.com wrote:
 
 Yes  performance degraded over the time, I can raise the memory but I can't
 do it every time and the volume will keep growing
 Is it better to put the solr on dedicated machine?
 Is there any thing else that can be done to the solr instance for example
 deviding the collection?
 
 rgds,
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032p4172039.html
 Sent from the Solr - User mailing list archive at Nabble.com.



AW: Replication of a corrupt master index

2014-12-02 Thread Charra, Johannes
Thanks for your response, Erick. 

Do you think it is possible to corrupt an index merely with HTTP requests? I've 
been using the aforementioned m/s setup for years now and have never seen a 
master failure.

I'm trying to think of scenarios where this setup (1 master, 4 slaves) might 
have a total outage. The master runs on a h/a cluster.

Regards,
Johannes

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Dienstag, 2. Dezember 2014 15:54
An: solr-user@lucene.apache.org
Betreff: Re: Replication of a corrupt master index

No. The master is the master and will always stay the master unless you change 
it. This is one of the reasons I really like to keep the original source around 
in case I every have this problem.

Best,
Erick

On Tue, Dec 2, 2014 at 2:34 AM, Charra, Johannes 
johannes.charrahorstm...@haufe-lexware.com wrote:

 Hi,

 If I have a master/slave setup and the master index gets corrupted, will the 
 slaves realize they should not replicate from the master anymore, since the 
 master does not have a newer index version?

 I'm using Solr version 4.2.1.

 Regards,
 Johannes




Find duplicates

2014-12-02 Thread Peter Kirk
Hi

Is it possible to formulate a Solr query which finds all documents which have 
the same value in a particular field?
Note, I don't know what the value is, I just want to find all documents with 
duplicate values.

For example, I have 5 documents:

Doc1: field Name = Peter
Doc2: field Name = Jack
Doc3: field Name = Peter
Doc4: field Name = Paul
Doc5: field Name = Jack


If I executed the query, it would find documents Doc1 and Doc3 (Peter is the 
same), and Doc2 and Doc5 (Jack is the same).



Thanks,
Peter


Re: Find duplicates

2014-12-02 Thread Erik Hatcher
Sort of… if you indexed the full value of the field (and you’re looking for 
truly exact matches) as a string field type you could facet on that field with 
facet.mincount=2 and the facets returned would be the ones with duplicate 
values.  You’d have to drill down on each of the facets returned to find the 
actual docs.

Erik

 On Dec 2, 2014, at 10:57 AM, Peter Kirk p...@alpha-solutions.dk wrote:
 
 Hi
 
 Is it possible to formulate a Solr query which finds all documents which have 
 the same value in a particular field?
 Note, I don't know what the value is, I just want to find all documents with 
 duplicate values.
 
 For example, I have 5 documents:
 
 Doc1: field Name = Peter
 Doc2: field Name = Jack
 Doc3: field Name = Peter
 Doc4: field Name = Paul
 Doc5: field Name = Jack
 
 
 If I executed the query, it would find documents Doc1 and Doc3 (Peter is the 
 same), and Doc2 and Doc5 (Jack is the same).
 
 
 
 Thanks,
 Peter



RE: Find duplicates

2014-12-02 Thread Gonzalo Rodriguez
Have you tried using result grouping for your query? There are some very good 
examples in the wiki:

https://wiki.apache.org/solr/FieldCollapsing


Gonzalo

-Original Message-
From: Peter Kirk [mailto:p...@alpha-solutions.dk] 
Sent: Tuesday, December 02, 2014 9:58 AM
To: solr-user@lucene.apache.org
Subject: Find duplicates

Hi

Is it possible to formulate a Solr query which finds all documents which have 
the same value in a particular field?
Note, I don't know what the value is, I just want to find all documents with 
duplicate values.

For example, I have 5 documents:

Doc1: field Name = Peter
Doc2: field Name = Jack
Doc3: field Name = Peter
Doc4: field Name = Paul
Doc5: field Name = Jack


If I executed the query, it would find documents Doc1 and Doc3 (Peter is the 
same), and Doc2 and Doc5 (Jack is the same).



Thanks,
Peter


spellchecker returns correctlySpelled=true if one term in phrase is correctly spelled

2014-12-02 Thread Tao, Jing
Hi,

It seems that when I do a phrase search, SOLR's spellchecker would return 
correctlySpelled=true if at least one term in the phrase was correctly spelled.
For example:
If I search for soriasis treatment, SOLR returns over 8000 search results for 
treatment, correctlySpelled: true, and a spelling suggestion of psoriasis 
for soriasis.
If I search for soriasis treatment, SOLR returns 0 results, 
correctlySpelled:false, and spelling suggestings for both soriasis and 
treatmnt.

Does this mean if I want to display a Did You Mean for soriasis treatment, 
I need to

1)  Check if there are any suggestions returned by spellchecker for any of 
the terms, and

2)  Compare the number of hits for each collation with the numFound for 
original query?

Another spellchecker question I have is how can I configure SOLR to suggest 
heart attack if someone searches for heart attach?  Technically, there are 
no misspellings, but heart attach as a phrase does not make sense.

Thanks,
Jing


Re: Find duplicates

2014-12-02 Thread Alexandre Rafalovitch
And if I am correct, enabling docValues will do this kind of grouping
as part of the indexing with docValues data structure (per segment).
So, all one has to do is to get it back (through faceting).

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 2 December 2014 at 11:02, Erik Hatcher erik.hatc...@gmail.com wrote:
 Sort of… if you indexed the full value of the field (and you’re looking for 
 truly exact matches) as a string field type you could facet on that field 
 with facet.mincount=2 and the facets returned would be the ones with 
 duplicate values.  You’d have to drill down on each of the facets returned to 
 find the actual docs.

 Erik

 On Dec 2, 2014, at 10:57 AM, Peter Kirk p...@alpha-solutions.dk wrote:

 Hi

 Is it possible to formulate a Solr query which finds all documents which 
 have the same value in a particular field?
 Note, I don't know what the value is, I just want to find all documents with 
 duplicate values.

 For example, I have 5 documents:

 Doc1: field Name = Peter
 Doc2: field Name = Jack
 Doc3: field Name = Peter
 Doc4: field Name = Paul
 Doc5: field Name = Jack


 If I executed the query, it would find documents Doc1 and Doc3 (Peter is the 
 same), and Doc2 and Doc5 (Jack is the same).



 Thanks,
 Peter



Re: Contextual search

2014-12-02 Thread ASHOK SARMAH
Hi alex thnx .i was able to get the get the suggestion for thri book as 
the book of three.but when i search for threebook (three and book are now
combined)then i am not able to get the suggestn for a book of three.how
we solve this?
On 01-Dec-2014 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 If you need Solr to treat 'thri' (invalid English) as 'three', you
 need to tell it to do so. Look at the synonym modules in the example's
 schema.xml.

 Or you could do phonetic matches. You have a couple of choices for
 those, but basically it's all about the specific analyzer chains to
 experiment with. So, start with that and come back if you still have
 troubles once you understand the way analyzers work.

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 1 December 2014 at 09:46, ASHOK SARMAH ashoksarmah1...@gmail.com
 wrote:
  Hii all .i wanted to know how solr performs contextual search.actually in
  my search list i had given the query as three book.i got the suggestn
 as
  a book of three.which i wanted.but when i specify it as thri book.it
  specifies me of spelling check for thri as three its fyn.but why i dont
 get
  in this case result as a book of three.like previous.



Re: SOLR Join Query, Use highest weight.

2014-12-02 Thread Darin Amos
Thanks!

I will take a look at this. I do have an additional question, since after a 
bunch of digging I believe I am going to run into another dead end.

I want to execute the join (or rollup) query, but I want the facets to 
represent the facets of all the child documents, not the resulting product 
documents. From what I gather, this is not possible.

My thought process of what I want to get goes as follows:

1) Execute my search for children
2) Get the facets for all the children
3) Rollup the child dataset into its parent dataset, keeping the score.

Is this easily possible with the tools available today?

Thanks!

Darin



 On Dec 1, 2014, at 11:01 PM, Mikhail Khludnev mkhlud...@griddynamics.com 
 wrote:
 
 Hello,
 
 AFAIK {!join} doesn't supply any meaningful scores.
 I can suggest https://issues.apache.org/jira/browse/SOLR-6234 
 https://issues.apache.org/jira/browse/SOLR-6234
 
 On Tue, Dec 2, 2014 at 4:35 AM, Darin Amos dari...@gmail.com 
 mailto:dari...@gmail.com wrote:
 
 Hello,
 
 I had sent an email a few days ago talking about implementing a custom
 rollup query component. I have changed directions a little bit because I
 have learned about the JoinQuery.
 
 I have an index that contains a combination of parent and child documents.
 The parent child relationship is always one-to-many.
 
 Here is a very simple sample query:
 
 
 http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child
 
 http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child
  
 http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child
 
 
 When I have a more specific query that actually give some meaningful
 weights:   q=name:(*Shirt*)%20OR%20name:(*Small*)  , it appears the
 rollup query assigns a weight to the parent of the last document
 encountered. For example, if a parents 2 children has weights of 1.4 and
 0.4 without the join query, the parent has a weight of 0.4 after the join
 query.
 
 Is there a way that I can extend or modify the join query so it would
 assign the highest child weight to the parent document?
 
 Thanks!!
 
 Darin
 
 
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com http://www.griddynamics.com/
 mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com



Tika HTTP 400 Errors with DIH

2014-12-02 Thread Teague James
Hi all,

I am using Solr 4.9.0 to index a DB with DIH. In the DB there is a URL
field. In the DIH Tika uses that field to fetch and parse the documents. The
URL from the field is valid and will download the document in the browser
just fine. But Tika is getting HTTP response code 400. Any ideas why?

ERROR
BinURLDataSource
java.io.IOException: Server returned HTTP response code: 400 for URL:

EntityProcessorWrapper
Exception in entity :
tika_content:org.apache.solr.handler.dataimport.DataImportHandlerException:
Exception in invoking url

DIH
dataConfig
dataSource type=JdbcDataSource
  name=ds-1
  driver=net.sourceforge.jtds.jdbc.Driver

url=jdbc:jtds:sqlserver://1.2.3.4/database;instance=INSTANCE;user=USER;pass
word=PASSWORD /

dataSource type=BinURLDataSource name=ds-2 /

document
entity name=db_content dataSource=ds-1
transformer=ClobTransformer, RegexTransformer 
query=SELECT ContentID,
DownloadURL
FROM DATABASE.VIEW
field column=ContentID name=id /
field column=DownloadURL clob=true
name=DownloadURL /

entity name=tika_content
processor=TikaEntityProcessor url=${db_content.DownloadURL}
onError=continue dataSource=ds-2
field column=TikaParsedContent /
/entity   

/entity
/document
/dataConfig

SCHEMA - Fields
field name=DownloadURL type=string indexed=true stored=true /
field name=TikaParsedContent type=text_general indexed=true
stored=true multiValued=true/





Re: Contextual search

2014-12-02 Thread Alexandre Rafalovitch
Well, how would you expect it to solve it - in non-technical terms.
What's the high level description of book of three matching
threebook and not say threeof? Random permutation of any two
words? It's a bit of a strange requirement so far.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 2 December 2014 at 12:55, ASHOK SARMAH ashoksarmah1...@gmail.com wrote:
 Hi alex thnx .i was able to get the get the suggestion for thri book as 
 the book of three.but when i search for threebook (three and book are now
 combined)then i am not able to get the suggestn for a book of three.how
 we solve this?
 On 01-Dec-2014 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 If you need Solr to treat 'thri' (invalid English) as 'three', you
 need to tell it to do so. Look at the synonym modules in the example's
 schema.xml.

 Or you could do phonetic matches. You have a couple of choices for
 those, but basically it's all about the specific analyzer chains to
 experiment with. So, start with that and come back if you still have
 troubles once you understand the way analyzers work.

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 1 December 2014 at 09:46, ASHOK SARMAH ashoksarmah1...@gmail.com
 wrote:
  Hii all .i wanted to know how solr performs contextual search.actually in
  my search list i had given the query as three book.i got the suggestn
 as
  a book of three.which i wanted.but when i specify it as thri book.it
  specifies me of spelling check for thri as three its fyn.but why i dont
 get
  in this case result as a book of three.like previous.



Re: Tika HTTP 400 Errors with DIH

2014-12-02 Thread Alexandre Rafalovitch
On 2 December 2014 at 13:19, Teague James teag...@insystechinc.com wrote:
 clob=true

What does ClobTransformer is doing on the DownloadURL field? Is it
possible it is corrupting the value somehow?

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: SOLR Join Query, Use highest weight.

2014-12-02 Thread Michael Sokolov
Have you considered using grouping?  If I understand your requirements, 
I think it does what you want.


https://cwiki.apache.org/confluence/display/solr/Result+Grouping

On 12/02/2014 12:59 PM, Darin Amos wrote:

Thanks!

I will take a look at this. I do have an additional question, since after a 
bunch of digging I believe I am going to run into another dead end.

I want to execute the join (or rollup) query, but I want the facets to 
represent the facets of all the child documents, not the resulting product 
documents. From what I gather, this is not possible.

My thought process of what I want to get goes as follows:

1) Execute my search for children
2) Get the facets for all the children
3) Rollup the child dataset into its parent dataset, keeping the score.

Is this easily possible with the tools available today?

Thanks!

Darin




On Dec 1, 2014, at 11:01 PM, Mikhail Khludnev mkhlud...@griddynamics.com 
wrote:

Hello,

AFAIK {!join} doesn't supply any meaningful scores.
I can suggest https://issues.apache.org/jira/browse/SOLR-6234 
https://issues.apache.org/jira/browse/SOLR-6234

On Tue, Dec 2, 2014 at 4:35 AM, Darin Amos dari...@gmail.com 
mailto:dari...@gmail.com wrote:


Hello,

I had sent an email a few days ago talking about implementing a custom
rollup query component. I have changed directions a little bit because I
have learned about the JoinQuery.

I have an index that contains a combination of parent and child documents.
The parent child relationship is always one-to-many.

Here is a very simple sample query:


http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child

http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child
 
http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child
When I have a more specific query that actually give some meaningful
weights:   q=name:(*Shirt*)%20OR%20name:(*Small*)  , it appears the
rollup query assigns a weight to the parent of the last document
encountered. For example, if a parents 2 children has weights of 1.4 and
0.4 without the join query, the parent has a weight of 0.4 after the join
query.

Is there a way that I can extend or modify the join query so it would
assign the highest child weight to the parent document?

Thanks!!

Darin




--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com http://www.griddynamics.com/
mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com






Re: Getting the position of a word via Solr API

2014-12-02 Thread Michael Sokolov
I would keep trying with the highlighters.  Some of them, at least, have 
options to provide an external text source, although you will almost 
certainly  have to write some java code to get this working; extend the 
highlighter you choose and supply its text from an external source.


-Mike

On 12/02/2014 08:13 AM, adfel70 wrote:

Small update,
I have managed making the Term Vector to work and I am getting all the words
of the text field.

The problem is that it doesn't work with several words combined, I can't
find the offset of the needed expression starts...

Any ideas anyone?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-the-position-of-a-word-via-Solr-API-tp4171877p4172092.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: SOLR Join Query, Use highest weight.

2014-12-02 Thread Darin Amos
Hi,

Thanks for the response, I have considered grouping often, but grouping does 
not return the parent document, just the group id. I would still have to add 
something to take the group id’s and get the parent documents.

Thanks

Darin

 On Dec 2, 2014, at 2:11 PM, Michael Sokolov msoko...@safaribooksonline.com 
 wrote:
 
 Have you considered using grouping?  If I understand your requirements, I 
 think it does what you want.
 
 https://cwiki.apache.org/confluence/display/solr/Result+Grouping 
 https://cwiki.apache.org/confluence/display/solr/Result+Grouping
 
 On 12/02/2014 12:59 PM, Darin Amos wrote:
 Thanks!
 
 I will take a look at this. I do have an additional question, since after a 
 bunch of digging I believe I am going to run into another dead end.
 
 I want to execute the join (or rollup) query, but I want the facets to 
 represent the facets of all the child documents, not the resulting product 
 documents. From what I gather, this is not possible.
 
 My thought process of what I want to get goes as follows:
 
 1) Execute my search for children
 2) Get the facets for all the children
 3) Rollup the child dataset into its parent dataset, keeping the score.
 
 Is this easily possible with the tools available today?
 
 Thanks!
 
 Darin
 
 
 
 On Dec 1, 2014, at 11:01 PM, Mikhail Khludnev mkhlud...@griddynamics.com 
 wrote:
 
 Hello,
 
 AFAIK {!join} doesn't supply any meaningful scores.
 I can suggest https://issues.apache.org/jira/browse/SOLR-6234 
 https://issues.apache.org/jira/browse/SOLR-6234 
 https://issues.apache.org/jira/browse/SOLR-6234 
 https://issues.apache.org/jira/browse/SOLR-6234
 
 On Tue, Dec 2, 2014 at 4:35 AM, Darin Amos dari...@gmail.com 
 mailto:dari...@gmail.com mailto:dari...@gmail.com 
 mailto:dari...@gmail.com wrote:
 
 Hello,
 
 I had sent an email a few days ago talking about implementing a custom
 rollup query component. I have changed directions a little bit because I
 have learned about the JoinQuery.
 
 I have an index that contains a combination of parent and child documents.
 The parent child relationship is always one-to-many.
 
 Here is a very simple sample query:
 
 
 http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child
  
 http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child
 
 http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child
  
 http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:childhttp://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child
  
 http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child
 When I have a more specific query that actually give some meaningful
 weights:   q=name:(*Shirt*)%20OR%20name:(*Small*)  , it appears the
 rollup query assigns a weight to the parent of the last document
 encountered. For example, if a parents 2 children has weights of 1.4 and
 0.4 without the join query, the parent has a weight of 0.4 after the join
 query.
 
 Is there a way that I can extend or modify the join query so it would
 assign the highest child weight to the parent document?
 
 Thanks!!
 
 Darin
 
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com http://www.griddynamics.com/ 
 http://www.griddynamics.com/ http://www.griddynamics.com/
 mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com 
 mailto:mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com



Re: SOLR Join Query, Use highest weight.

2014-12-02 Thread Michael Sokolov
We simply index parent and child documents with the same field value, 
and group on that, querying both parent and child documents. If you 
boost the parent it will show up as the first result in the group.  Then 
you get all related documents together. in the same group.


-Mike

On 12/02/2014 02:27 PM, Darin Amos wrote:

Hi,

Thanks for the response, I have considered grouping often, but grouping does 
not return the parent document, just the group id. I would still have to add 
something to take the group id’s and get the parent documents.

Thanks

Darin


On Dec 2, 2014, at 2:11 PM, Michael Sokolov msoko...@safaribooksonline.com 
wrote:

Have you considered using grouping?  If I understand your requirements, I think 
it does what you want.

https://cwiki.apache.org/confluence/display/solr/Result+Grouping 
https://cwiki.apache.org/confluence/display/solr/Result+Grouping

On 12/02/2014 12:59 PM, Darin Amos wrote:

Thanks!

I will take a look at this. I do have an additional question, since after a 
bunch of digging I believe I am going to run into another dead end.

I want to execute the join (or rollup) query, but I want the facets to 
represent the facets of all the child documents, not the resulting product 
documents. From what I gather, this is not possible.

My thought process of what I want to get goes as follows:

1) Execute my search for children
2) Get the facets for all the children
3) Rollup the child dataset into its parent dataset, keeping the score.

Is this easily possible with the tools available today?

Thanks!

Darin




On Dec 1, 2014, at 11:01 PM, Mikhail Khludnev mkhlud...@griddynamics.com 
wrote:

Hello,

AFAIK {!join} doesn't supply any meaningful scores.
I can suggest https://issues.apache.org/jira/browse/SOLR-6234 
https://issues.apache.org/jira/browse/SOLR-6234 
https://issues.apache.org/jira/browse/SOLR-6234 
https://issues.apache.org/jira/browse/SOLR-6234

On Tue, Dec 2, 2014 at 4:35 AM, Darin Amos dari...@gmail.com mailto:dari...@gmail.com 
mailto:dari...@gmail.com mailto:dari...@gmail.com wrote:


Hello,

I had sent an email a few days ago talking about implementing a custom
rollup query component. I have changed directions a little bit because I
have learned about the JoinQuery.

I have an index that contains a combination of parent and child documents.
The parent child relationship is always one-to-many.

Here is a very simple sample query:


http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child
 
http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child

http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child 
http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:childhttp://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child
 http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child
When I have a more specific query that actually give some meaningful
weights:   q=name:(*Shirt*)%20OR%20name:(*Small*)  , it appears the
rollup query assigns a weight to the parent of the last document
encountered. For example, if a parents 2 children has weights of 1.4 and
0.4 without the join query, the parent has a weight of 0.4 after the join
query.

Is there a way that I can extend or modify the join query so it would
assign the highest child weight to the parent document?

Thanks!!

Darin



--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com http://www.griddynamics.com/ http://www.griddynamics.com/ 
http://www.griddynamics.com/
mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com 
mailto:mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com






Solr collection alias - how rank is affected

2014-12-02 Thread SolrUser1543
Solr allows create an alias for few collection via its API

Suppose I have two collection C1  C2 and an alias C3 = C1 , C2 

C1 and C2 deployed on different machines , but has a mutual ZooKeeper .

How rank is affected when searching C3 collection ? 
when they has same schema ?
different schema ? ( it is possible to search on different schemas if we use
field alias on both collections ) 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-collection-alias-how-rank-is-affected-tp4172197.html
Sent from the Solr - User mailing list archive at Nabble.com.


indexing numbers in texts for range queries

2014-12-02 Thread Mikhail Khludnev
Hello Searchers,

Don't you remember any examples of indexing numbers inside of plain text.
eg. if I have a text: foo and 10 bars I want to find it with a query like
foo [8 TO 20] bars.
The question no.1 whether to put trie terms into the separate field or they
can reside at the same text one? Note, enumerating [0-9]* terms in
MultiTermQuery is not an option for me, I definitely need the trie field
magic!
Perhaps you can remind a blog or chapter, whatever makes me happy.

Thanks a lot!

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: indexing numbers in texts for range queries

2014-12-02 Thread Michael Sokolov
Mikhail - I can imagine a filter that strips out everything but numbers 
and then indexes those with a (separate) numeric (trie) field.  But I 
don't believe you can do phrase or other proximity queries across 
multiple fields.  As long as an or-query is good enough, I think this 
problem is not too hard?  But if you need proximity it becomes more 
complicated.  Once in the distant past we coded a numeric range query 
using a complicated set of wildcard queries that could handle large 
numbers efficiently - this search index (Verity) had no range 
capability, so we had to mock it up using text.  The way this worked was 
something along these lines:


1) transform all the numbers into their binary encoding (8 = 0b1000, eg)
2) write queries by encoding the range as a set of bitmasks represented 
by wildcard queries:

[8 TO 20] becomes (0b1000 0b000100?? 0b00010100)

I know you said you cannot use [0-9]* terms, but you will not see 
terrible term explosion with this.  What's your concern there?


-Mike


On 12/02/2014 02:59 PM, Mikhail Khludnev wrote:

Hello Searchers,

Don't you remember any examples of indexing numbers inside of plain text.
eg. if I have a text: foo and 10 bars I want to find it with a query like
foo [8 TO 20] bars.
The question no.1 whether to put trie terms into the separate field or they
can reside at the same text one? Note, enumerating [0-9]* terms in
MultiTermQuery is not an option for me, I definitely need the trie field
magic!
Perhaps you can remind a blog or chapter, whatever makes me happy.

Thanks a lot!





Re: indexing numbers in texts for range queries

2014-12-02 Thread Mikhail Khludnev
Hello Michael,

On Tue, Dec 2, 2014 at 11:15 PM, Michael Sokolov 
msoko...@safaribooksonline.com wrote:

 Mikhail - I can imagine a filter that strips out everything but numbers
 and then indexes those with a (separate) numeric (trie) field.  But I don't
 believe you can do phrase or other proximity queries across multiple
 fields.

Technically it's not a big deal. I used FieldMaskingSpanQuery before.

As long as an or-query is good enough, I think this problem is not too
 hard?  But if you need proximity it becomes more complicated.  Once in the
 distant past we coded a numeric range query using a complicated set of
 wildcard queries that could handle large numbers efficiently - this search
 index (Verity) had no range capability, so we had to mock it up using
 text.  The way this worked was something along these lines:

 1) transform all the numbers into their binary encoding (8 = 0b1000,
 eg)
 2) write queries by encoding the range as a set of bitmasks represented by
 wildcard queries:
 [8 TO 20] becomes (0b1000 0b000100?? 0b00010100)

 I know you said you cannot use [0-9]* terms, but you will not see terrible
 term explosion with this.  What's your concern there?

it's not terrible but significant, I wish to make a try with the trie
magic, which reduces query time processing.

Thanks for suggestions.
Do I remember correctly that you ignored last Lucene Revolution?


 -Mike



 On 12/02/2014 02:59 PM, Mikhail Khludnev wrote:

 Hello Searchers,

 Don't you remember any examples of indexing numbers inside of plain text.
 eg. if I have a text: foo and 10 bars I want to find it with a query
 like
 foo [8 TO 20] bars.
 The question no.1 whether to put trie terms into the separate field or
 they
 can reside at the same text one? Note, enumerating [0-9]* terms in
 MultiTermQuery is not an option for me, I definitely need the trie field
 magic!
 Perhaps you can remind a blog or chapter, whatever makes me happy.

 Thanks a lot!





-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: indexing numbers in texts for range queries

2014-12-02 Thread Michael Sokolov


On 12/02/2014 03:41 PM, Mikhail Khludnev wrote:
Thanks for suggestions. Do I remember correctly that you ignored last 
Lucene Revolution?
I wouldn't say I ignored it, but it's true I wasn't there in DC: I'm 
excited to catch up on the presentations as the videos become available, 
though.


-Mike


Re: indexing numbers in texts for range queries

2014-12-02 Thread Ahmet Arslan
Hi Mikhail,

Range queries allowed inside phrases with ComplexPhraseQParser, but I think 
string order is used.

Also LUCENE-5205 / SOLR-5410 is meant to supersede complex phrase. It might 
have that functionality too.

Ahmet
 



On Tuesday, December 2, 2014 10:43 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:
Hello Michael,

On Tue, Dec 2, 2014 at 11:15 PM, Michael Sokolov 
msoko...@safaribooksonline.com wrote:

 Mikhail - I can imagine a filter that strips out everything but numbers
 and then indexes those with a (separate) numeric (trie) field.  But I don't
 believe you can do phrase or other proximity queries across multiple
 fields.

Technically it's not a big deal. I used FieldMaskingSpanQuery before.

As long as an or-query is good enough, I think this problem is not too
 hard?  But if you need proximity it becomes more complicated.  Once in the
 distant past we coded a numeric range query using a complicated set of
 wildcard queries that could handle large numbers efficiently - this search
 index (Verity) had no range capability, so we had to mock it up using
 text.  The way this worked was something along these lines:

 1) transform all the numbers into their binary encoding (8 = 0b1000,
 eg)
 2) write queries by encoding the range as a set of bitmasks represented by
 wildcard queries:
 [8 TO 20] becomes (0b1000 0b000100?? 0b00010100)

 I know you said you cannot use [0-9]* terms, but you will not see terrible
 term explosion with this.  What's your concern there?

it's not terrible but significant, I wish to make a try with the trie
magic, which reduces query time processing.

Thanks for suggestions.
Do I remember correctly that you ignored last Lucene Revolution?


 -Mike



 On 12/02/2014 02:59 PM, Mikhail Khludnev wrote:

 Hello Searchers,

 Don't you remember any examples of indexing numbers inside of plain text.
 eg. if I have a text: foo and 10 bars I want to find it with a query
 like
 foo [8 TO 20] bars.
 The question no.1 whether to put trie terms into the separate field or
 they
 can reside at the same text one? Note, enumerating [0-9]* terms in
 MultiTermQuery is not an option for me, I definitely need the trie field
 magic!
 Perhaps you can remind a blog or chapter, whatever makes me happy.

 Thanks a lot!





-- 
Sincerely yours

Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com



Re: Replication of a corrupt master index

2014-12-02 Thread Erick Erickson
If nothing else, the disk underlying the index could have a bad spot...

There have been some corrupt index bugs in the past, but they always
get a super-high priority for fixing so don't hang around for long.

You can always take periodic backups. Perhaps the slickest way to do that
is to set up a slave that does nothing but poll once/day. Since you know
that's not changing, you can do simple disk copies of the index and at least
minimize your possible outage.

Now, all that said you may wan to consider SolrCloud. The advantage there is
that each node gets the raw input and very rarely does replication. Failover
is as simple in that scenario as killing the bad node and things just work.

Best,
Erick

On Tue, Dec 2, 2014 at 7:40 AM, Charra, Johannes
johannes.charrahorstm...@haufe-lexware.com wrote:
 Thanks for your response, Erick.

 Do you think it is possible to corrupt an index merely with HTTP requests? 
 I've been using the aforementioned m/s setup for years now and have never 
 seen a master failure.

 I'm trying to think of scenarios where this setup (1 master, 4 slaves) might 
 have a total outage. The master runs on a h/a cluster.

 Regards,
 Johannes

 -Ursprüngliche Nachricht-
 Von: Erick Erickson [mailto:erickerick...@gmail.com]
 Gesendet: Dienstag, 2. Dezember 2014 15:54
 An: solr-user@lucene.apache.org
 Betreff: Re: Replication of a corrupt master index

 No. The master is the master and will always stay the master unless you 
 change it. This is one of the reasons I really like to keep the original 
 source around in case I every have this problem.

 Best,
 Erick

 On Tue, Dec 2, 2014 at 2:34 AM, Charra, Johannes 
 johannes.charrahorstm...@haufe-lexware.com wrote:

 Hi,

 If I have a master/slave setup and the master index gets corrupted, will the 
 slaves realize they should not replicate from the master anymore, since the 
 master does not have a newer index version?

 I'm using Solr version 4.2.1.

 Regards,
 Johannes




Re: Contextual search

2014-12-02 Thread ASHOK SARMAH
HI Alex,

I have specified following in my solrconfig.xml ::

 str name=spellcheckon/str
   str name=spellcheck.extendedResultstrue/str
   str name=spellcheck.count5/str
   str name=spellcheck.alternativeTermCount2/str
   str name=spellcheck.maxResultsForSuggest5/str
   str name=spellcheck.collatetrue/str
   str name=spellcheck.collateExtendedResultstrue/str
   str name=spellcheck.maxCollationTries5/str
   str name=spellcheck.maxCollations3/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.MinBreakWordLength5/str


I have written str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.MinBreakWordLength5/str   to break the words with
minimum length 5.then it should break my word threebook as three and book
right?correct me if I am wrong.But I am not getting the required search
results.Kindly suggest.

On Wed, Dec 3, 2014 at 12:08 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Well, how would you expect it to solve it - in non-technical terms.
 What's the high level description of book of three matching
 threebook and not say threeof? Random permutation of any two
 words? It's a bit of a strange requirement so far.

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 2 December 2014 at 12:55, ASHOK SARMAH ashoksarmah1...@gmail.com
 wrote:
  Hi alex thnx .i was able to get the get the suggestion for thri book as 
  the book of three.but when i search for threebook (three and book are
 now
  combined)then i am not able to get the suggestn for a book of three.how
  we solve this?
  On 01-Dec-2014 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
  If you need Solr to treat 'thri' (invalid English) as 'three', you
  need to tell it to do so. Look at the synonym modules in the example's
  schema.xml.
 
  Or you could do phonetic matches. You have a couple of choices for
  those, but basically it's all about the specific analyzer chains to
  experiment with. So, start with that and come back if you still have
  troubles once you understand the way analyzers work.
 
  Regards,
 Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and
 @solrstart
  Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
 
 
  On 1 December 2014 at 09:46, ASHOK SARMAH ashoksarmah1...@gmail.com
  wrote:
   Hii all .i wanted to know how solr performs contextual
 search.actually in
   my search list i had given the query as three book.i got the
 suggestn
  as
   a book of three.which i wanted.but when i specify it as thri
 book.it
   specifies me of spelling check for thri as three its fyn.but why i
 dont
  get
   in this case result as a book of three.like previous.
 



Re: Contextual search

2014-12-02 Thread ASHOK SARMAH
HI Alex,

I have specified these in the solrconfig.xml as::

str name=spellcheckon/str
   str name=spellcheck.extendedResultstrue/str
   str name=spellcheck.count5/str
   str name=spellcheck.alternativeTermCount2/str
   str name=spellcheck.maxResultsForSuggest5/str
   str name=spellcheck.collatetrue/str
   str name=spellcheck.collateExtendedResultstrue/str
   str name=spellcheck.maxCollationTries5/str
   str name=spellcheck.maxCollations3/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.MinBreakWordLength5/str .



The lines  str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.MinBreakWordLength5/str  are for breaking the
word threebook as three and book .But then too its not searching for the
string A book of three.Kindly suggest what all ways it can be done

On Wed, Dec 3, 2014 at 12:08 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Well, how would you expect it to solve it - in non-technical terms.
 What's the high level description of book of three matching
 threebook and not say threeof? Random permutation of any two
 words? It's a bit of a strange requirement so far.

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 2 December 2014 at 12:55, ASHOK SARMAH ashoksarmah1...@gmail.com
 wrote:
  Hi alex thnx .i was able to get the get the suggestion for thri book as 
  the book of three.but when i search for threebook (three and book are
 now
  combined)then i am not able to get the suggestn for a book of three.how
  we solve this?
  On 01-Dec-2014 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
  If you need Solr to treat 'thri' (invalid English) as 'three', you
  need to tell it to do so. Look at the synonym modules in the example's
  schema.xml.
 
  Or you could do phonetic matches. You have a couple of choices for
  those, but basically it's all about the specific analyzer chains to
  experiment with. So, start with that and come back if you still have
  troubles once you understand the way analyzers work.
 
  Regards,
 Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and
 @solrstart
  Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
 
 
  On 1 December 2014 at 09:46, ASHOK SARMAH ashoksarmah1...@gmail.com
  wrote:
   Hii all .i wanted to know how solr performs contextual
 search.actually in
   my search list i had given the query as three book.i got the
 suggestn
  as
   a book of three.which i wanted.but when i specify it as thri
 book.it
   specifies me of spelling check for thri as three its fyn.but why i
 dont
  get
   in this case result as a book of three.like previous.
 



Re: Contextual search

2014-12-02 Thread Alexandre Rafalovitch
Sorry, beyond my area of expertise now. Hopefully somebody else will pitch in.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 2 December 2014 at 22:03, ASHOK SARMAH ashoksarmah1...@gmail.com wrote:
 HI Alex,

 I have specified these in the solrconfig.xml as::

 str name=spellcheckon/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.count5/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollationTries5/str
str name=spellcheck.maxCollations3/str
 str name=spellcheck.dictionarywordbreak/str
 str name=spellcheck.MinBreakWordLength5/str .



 The lines  str name=spellcheck.dictionarywordbreak/str
 str name=spellcheck.MinBreakWordLength5/str  are for breaking the
 word threebook as three and book .But then too its not searching for the
 string A book of three.Kindly suggest what all ways it can be done

 On Wed, Dec 3, 2014 at 12:08 AM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

 Well, how would you expect it to solve it - in non-technical terms.
 What's the high level description of book of three matching
 threebook and not say threeof? Random permutation of any two
 words? It's a bit of a strange requirement so far.

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 2 December 2014 at 12:55, ASHOK SARMAH ashoksarmah1...@gmail.com
 wrote:
  Hi alex thnx .i was able to get the get the suggestion for thri book as 
  the book of three.but when i search for threebook (three and book are
 now
  combined)then i am not able to get the suggestn for a book of three.how
  we solve this?
  On 01-Dec-2014 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
  If you need Solr to treat 'thri' (invalid English) as 'three', you
  need to tell it to do so. Look at the synonym modules in the example's
  schema.xml.
 
  Or you could do phonetic matches. You have a couple of choices for
  those, but basically it's all about the specific analyzer chains to
  experiment with. So, start with that and come back if you still have
  troubles once you understand the way analyzers work.
 
  Regards,
 Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and
 @solrstart
  Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
 
 
  On 1 December 2014 at 09:46, ASHOK SARMAH ashoksarmah1...@gmail.com
  wrote:
   Hii all .i wanted to know how solr performs contextual
 search.actually in
   my search list i had given the query as three book.i got the
 suggestn
  as
   a book of three.which i wanted.but when i specify it as thri
 book.it
   specifies me of spelling check for thri as three its fyn.but why i
 dont
  get
   in this case result as a book of three.like previous.