date:20140410

Okay sir i will mail to solr-user only, I am feeling so thankful to you for
all you help, i am java developer with a good knowledge of perl, working on
solr, actually just started working on solr for the geospatial search(not
using JTS) only, To be very frank I learned about faceting from Mr Yonik's
tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all.
What is your suggestion now and yesterday i suscribed for solr-start as
well. And sir what do you mean by *Create a basic project using that
library and latest version of Solr.*

With Regards
Aman Tandon


On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Hi Aman,

 Nice of you to want to help. Let's keep the discussion in the user
 mailing list as opposed to the developer one (most of the people are
 on both).

 What is your skill set? Are you familiar with particular languages? If
 so, the easiest way to contribute would be the following:
 1) Find all the solr client libraries in the language you are most
 familiar with (PHP, Java, Perl, Python, etc)
 2) Create a basic project using that library and latest version of
 Solr. Maybe using Solr tutorial as a baseline and showing how to do
 the same steps in the client instead of with command line/Curl.
 3) Write a blog post about what you learned, whether the library is
 supporting latest Solr well and whether it is supporting latest
 features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud)

 If that does not appeal, give an example of where your skills are
 strongest and I am sure there is a way for you to contribute.

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com
 wrote:
  Can anybody please explain me that how should i start contributing to
 solr,
  i am novice here as well in this technology as well, but i am learning
 solr
  day by day.
  So how should i start ?
 
  Thanks
  Aman Tandon

Tomcat creates a thread for each SOLR core

Hi, guys,

I need some help. After updating to SOLR 4.4 the tomcat process is
consuming about 2GBs of memory, the CPU usage is about 40% after the start
for about 10 minutes. However, the bigger problem is, I have about 1000
cores and seems that for each core a thread is created. The process has
more than 1000 threads and everything is extremely slow. Creating or
unloading a core even without documents takes about 20 minutes. Searching
is more or less good, but storing also takes a lot.
Is there some configuration I missed or that I did wrong? There aren't many
calls, I use 64 bit tomcat 7, SOLR 4.4, latest 64 bit Java. The machine has
24 GBs of RAM, a CPU with 16 cores and is running Windows Server 2008 R2.
Index is uppdated every 30 seconds/10 000 documents.
I haven't checked the number of threads before the update, because I didn't
have to, it was working just fine. Any suggestion will be highly
appreciated, thank you in advance.

Regards,
Atanas

Re: Ranking code

For the better analysis for ranking of documents, you should need to query
the index with these extra parameters(in bold)
eg...whole_query*debug=truewt=xml.*
Copy that xml and and paste it to http://explain.solr.pl/ you can then
easily find out the ranking alalysis in the forms of the pie charts and how
much weight is giving to every parameters in your solr config and in the
query.


On Tue, Apr 8, 2014 at 9:09 PM, Shawn Heisey s...@elyograg.org wrote:

 On 4/8/2014 3:55 AM, azhar2007 wrote:

 Im basically trying to understand how results are ranked. Whats the
 algorithm behind it


 If you add a debugQuery parameter to your request, set to true, you will
 see the score calculation for every document included in the response.

 This is the default similarity class that Solr uses:

 http://lucene.apache.org/core/4_7_0/core/org/apache/lucene/
 search/similarities/DefaultSimilarity.html

 Thanks,
 Shawn

Re: best way to contribute solr??

Great, Solr + Perl + Geospatial.

There are two Perl clients for Solr listed on the Wiki:
http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If
yes, add them  to the Wiki (need to ask permission to edit Wiki).

Are those two listed clients dead or alive? Do they work with Solr
4.7.1? Can you make them work with Solr 4.7.1 and recent version of
Perl? Can you do a small demo that uses Perl client to index some
geospatial information and then do a search for it?

I strongly suspect you will hit some interesting issues. Find the fix,
contribute back to the Perl library maintainer. Or, at least, clearly
describe the issue, if you don't yet know enough to contribute the
fix.

Regards,
   Alex.


Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com wrote:
 Okay sir i will mail to solr-user only, I am feeling so thankful to you for
 all you help, i am java developer with a good knowledge of perl, working on
 solr, actually just started working on solr for the geospatial search(not
 using JTS) only, To be very frank I learned about faceting from Mr Yonik's
 tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats all.
 What is your suggestion now and yesterday i suscribed for solr-start as
 well. And sir what do you mean by *Create a basic project using that
 library and latest version of Solr.*

 With Regards
 Aman Tandon


 On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

 Hi Aman,

 Nice of you to want to help. Let's keep the discussion in the user
 mailing list as opposed to the developer one (most of the people are
 on both).

 What is your skill set? Are you familiar with particular languages? If
 so, the easiest way to contribute would be the following:
 1) Find all the solr client libraries in the language you are most
 familiar with (PHP, Java, Perl, Python, etc)
 2) Create a basic project using that library and latest version of
 Solr. Maybe using Solr tutorial as a baseline and showing how to do
 the same steps in the client instead of with command line/Curl.
 3) Write a blog post about what you learned, whether the library is
 supporting latest Solr well and whether it is supporting latest
 features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud)

 If that does not appeal, give an example of where your skills are
 strongest and I am sure there is a way for you to contribute.

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com
 wrote:
  Can anybody please explain me that how should i start contributing to
 solr,
  i am novice here as well in this technology as well, but i am learning
 solr
  day by day.
  So how should i start ?
 
  Thanks
  Aman Tandon

Re: Tomcat creates a thread for each SOLR core

I guess this is definitely due to the firstsearcher defined in
solrconfig.xml, you must make some tweaks in that I hope it will help. We
are using the same typo which you just mentioned here but we are using the
indexing server separately and replicating data to our other two server so
that it won't harm any search performance.

Thanks
Aman Tandon


On Thu, Apr 10, 2014 at 12:10 PM, Atanas Atanasov atanaso...@gmail.comwrote:

 Hi, guys,

 I need some help. After updating to SOLR 4.4 the tomcat process is
 consuming about 2GBs of memory, the CPU usage is about 40% after the start
 for about 10 minutes. However, the bigger problem is, I have about 1000
 cores and seems that for each core a thread is created. The process has
 more than 1000 threads and everything is extremely slow. Creating or
 unloading a core even without documents takes about 20 minutes. Searching
 is more or less good, but storing also takes a lot.
 Is there some configuration I missed or that I did wrong? There aren't many
 calls, I use 64 bit tomcat 7, SOLR 4.4, latest 64 bit Java. The machine has
 24 GBs of RAM, a CPU with 16 cores and is running Windows Server 2008 R2.
 Index is uppdated every 30 seconds/10 000 documents.
 I haven't checked the number of threads before the update, because I didn't
 have to, it was working just fine. Any suggestion will be highly
 appreciated, thank you in advance.

 Regards,
 Atanas

Re: Tomcat creates a thread for each SOLR core

Are you using all those cores at once? If not, there is a recent
settings to allow solr to load cores on demand.

If you are using them all, perhaps you need to look into splitting
them to different machines (horizontal scaling).

What about your caches? How many additional structures you have
configured for each core? How much memory you allocated to the Java
process. You are probably running out of memory and thrashing with a
swap. I am not even sure Java process can access that much memory in
one process. You might be better off running multiple Tomcat/Solr
instances on the same machine with different subsets of cores.

Regards,
   Alex.
P.s. This is general advice, I don't know the specific issues around
that version of Solr/Tomcat.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Apr 10, 2014 at 1:40 PM, Atanas Atanasov atanaso...@gmail.com wrote:
 Hi, guys,

 I need some help. After updating to SOLR 4.4 the tomcat process is
 consuming about 2GBs of memory, the CPU usage is about 40% after the start
 for about 10 minutes. However, the bigger problem is, I have about 1000
 cores and seems that for each core a thread is created. The process has
 more than 1000 threads and everything is extremely slow. Creating or
 unloading a core even without documents takes about 20 minutes. Searching
 is more or less good, but storing also takes a lot.
 Is there some configuration I missed or that I did wrong? There aren't many
 calls, I use 64 bit tomcat 7, SOLR 4.4, latest 64 bit Java. The machine has
 24 GBs of RAM, a CPU with 16 cores and is running Windows Server 2008 R2.
 Index is uppdated every 30 seconds/10 000 documents.
 I haven't checked the number of threads before the update, because I didn't
 have to, it was working just fine. Any suggestion will be highly
 appreciated, thank you in advance.

 Regards,
 Atanas

Re: best way to contribute solr??

Thank you so much sir :)

Can i try in java as well?

Thanks
Aman Tandon


On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Great, Solr + Perl + Geospatial.

 There are two Perl clients for Solr listed on the Wiki:
 http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If
 yes, add them  to the Wiki (need to ask permission to edit Wiki).

 Are those two listed clients dead or alive? Do they work with Solr
 4.7.1? Can you make them work with Solr 4.7.1 and recent version of
 Perl? Can you do a small demo that uses Perl client to index some
 geospatial information and then do a search for it?

 I strongly suspect you will hit some interesting issues. Find the fix,
 contribute back to the Perl library maintainer. Or, at least, clearly
 describe the issue, if you don't yet know enough to contribute the
 fix.

 Regards,
Alex.


 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com
 wrote:
  Okay sir i will mail to solr-user only, I am feeling so thankful to you
 for
  all you help, i am java developer with a good knowledge of perl, working
 on
  solr, actually just started working on solr for the geospatial search(not
  using JTS) only, To be very frank I learned about faceting from Mr
 Yonik's
  tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats
 all.
  What is your suggestion now and yesterday i suscribed for solr-start as
  well. And sir what do you mean by *Create a basic project using that
  library and latest version of Solr.*
 
  With Regards
  Aman Tandon
 
 
  On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
  Hi Aman,
 
  Nice of you to want to help. Let's keep the discussion in the user
  mailing list as opposed to the developer one (most of the people are
  on both).
 
  What is your skill set? Are you familiar with particular languages? If
  so, the easiest way to contribute would be the following:
  1) Find all the solr client libraries in the language you are most
  familiar with (PHP, Java, Perl, Python, etc)
  2) Create a basic project using that library and latest version of
  Solr. Maybe using Solr tutorial as a baseline and showing how to do
  the same steps in the client instead of with command line/Curl.
  3) Write a blog post about what you learned, whether the library is
  supporting latest Solr well and whether it is supporting latest
  features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud)
 
  If that does not appeal, give an example of where your skills are
  strongest and I am sure there is a way for you to contribute.
 
  Regards,
 Alex.
 
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com
  wrote:
   Can anybody please explain me that how should i start contributing to
  solr,
   i am novice here as well in this technology as well, but i am learning
  solr
   day by day.
   So how should i start ?
  
   Thanks
   Aman Tandon

Re: best way to contribute solr??

I will also try to do in the perl as well, this is going to be  something
great, i am excited :D
Thanks a ton!!

Thanks
Aman Tandon


On Thu, Apr 10, 2014 at 12:22 PM, Aman Tandon amantandon...@gmail.comwrote:

 Thank you so much sir :)

 Can i try in java as well?

 Thanks
 Aman Tandon


 On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch 
 arafa...@gmail.com wrote:

 Great, Solr + Perl + Geospatial.

 There are two Perl clients for Solr listed on the Wiki:
 http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If
 yes, add them  to the Wiki (need to ask permission to edit Wiki).

 Are those two listed clients dead or alive? Do they work with Solr
 4.7.1? Can you make them work with Solr 4.7.1 and recent version of
 Perl? Can you do a small demo that uses Perl client to index some
 geospatial information and then do a search for it?

 I strongly suspect you will hit some interesting issues. Find the fix,
 contribute back to the Perl library maintainer. Or, at least, clearly
 describe the issue, if you don't yet know enough to contribute the
 fix.

 Regards,
Alex.


 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com
 wrote:
  Okay sir i will mail to solr-user only, I am feeling so thankful to you
 for
  all you help, i am java developer with a good knowledge of perl,
 working on
  solr, actually just started working on solr for the geospatial
 search(not
  using JTS) only, To be very frank I learned about faceting from Mr
 Yonik's
  tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats
 all.
  What is your suggestion now and yesterday i suscribed for solr-start as
  well. And sir what do you mean by *Create a basic project using that
  library and latest version of Solr.*
 
  With Regards
  Aman Tandon
 
 
  On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
  Hi Aman,
 
  Nice of you to want to help. Let's keep the discussion in the user
  mailing list as opposed to the developer one (most of the people are
  on both).
 
  What is your skill set? Are you familiar with particular languages? If
  so, the easiest way to contribute would be the following:
  1) Find all the solr client libraries in the language you are most
  familiar with (PHP, Java, Perl, Python, etc)
  2) Create a basic project using that library and latest version of
  Solr. Maybe using Solr tutorial as a baseline and showing how to do
  the same steps in the client instead of with command line/Curl.
  3) Write a blog post about what you learned, whether the library is
  supporting latest Solr well and whether it is supporting latest
  features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud)
 
  If that does not appeal, give an example of where your skills are
  strongest and I am sure there is a way for you to contribute.
 
  Regards,
 Alex.
 
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com
 
  wrote:
   Can anybody please explain me that how should i start contributing to
  solr,
   i am novice here as well in this technology as well, but i am
 learning
  solr
   day by day.
   So how should i start ?
  
   Thanks
   Aman Tandon

Re: best way to contribute solr??

Sure, you can do it in Java too. The difference is that Solr comes
with Java client SolrJ which is tested and kept up-to-date. But there
could still be more tutorials.

For other languages/clients, there is a lot less information
available.  Especially, if you start adding (human) languages into it.
E.g. how to process your own language (if non-English).

And there are many more ideas on Slide 26 of
http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup
. As well as an example of processing pipeline for Thai. More of these
kinds of things would be useful too.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com wrote:
 Thank you so much sir :)

 Can i try in java as well?

 Thanks
 Aman Tandon


 On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

 Great, Solr + Perl + Geospatial.

 There are two Perl clients for Solr listed on the Wiki:
 http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If
 yes, add them  to the Wiki (need to ask permission to edit Wiki).

 Are those two listed clients dead or alive? Do they work with Solr
 4.7.1? Can you make them work with Solr 4.7.1 and recent version of
 Perl? Can you do a small demo that uses Perl client to index some
 geospatial information and then do a search for it?

 I strongly suspect you will hit some interesting issues. Find the fix,
 contribute back to the Perl library maintainer. Or, at least, clearly
 describe the issue, if you don't yet know enough to contribute the
 fix.

 Regards,
Alex.


 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com
 wrote:
  Okay sir i will mail to solr-user only, I am feeling so thankful to you
 for
  all you help, i am java developer with a good knowledge of perl, working
 on
  solr, actually just started working on solr for the geospatial search(not
  using JTS) only, To be very frank I learned about faceting from Mr
 Yonik's
  tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats
 all.
  What is your suggestion now and yesterday i suscribed for solr-start as
  well. And sir what do you mean by *Create a basic project using that
  library and latest version of Solr.*
 
  With Regards
  Aman Tandon
 
 
  On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
  Hi Aman,
 
  Nice of you to want to help. Let's keep the discussion in the user
  mailing list as opposed to the developer one (most of the people are
  on both).
 
  What is your skill set? Are you familiar with particular languages? If
  so, the easiest way to contribute would be the following:
  1) Find all the solr client libraries in the language you are most
  familiar with (PHP, Java, Perl, Python, etc)
  2) Create a basic project using that library and latest version of
  Solr. Maybe using Solr tutorial as a baseline and showing how to do
  the same steps in the client instead of with command line/Curl.
  3) Write a blog post about what you learned, whether the library is
  supporting latest Solr well and whether it is supporting latest
  features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud)
 
  If that does not appeal, give an example of where your skills are
  strongest and I am sure there is a way for you to contribute.
 
  Regards,
 Alex.
 
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon amantandon...@gmail.com
  wrote:
   Can anybody please explain me that how should i start contributing to
  solr,
   i am novice here as well in this technology as well, but i am learning
  solr
   day by day.
   So how should i start ?
  
   Thanks
   Aman Tandon

Re: best way to contribute solr??

Thanks sir, I will look into this.
Solr and its developer are all helpful and awesome, i am feeling great.

Thanks
Aman Tandon


On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Sure, you can do it in Java too. The difference is that Solr comes
 with Java client SolrJ which is tested and kept up-to-date. But there
 could still be more tutorials.

 For other languages/clients, there is a lot less information
 available.  Especially, if you start adding (human) languages into it.
 E.g. how to process your own language (if non-English).

 And there are many more ideas on Slide 26 of
 http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup
 . As well as an example of processing pipeline for Thai. More of these
 kinds of things would be useful too.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com
 wrote:
  Thank you so much sir :)
 
  Can i try in java as well?
 
  Thanks
  Aman Tandon
 
 
  On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
  Great, Solr + Perl + Geospatial.
 
  There are two Perl clients for Solr listed on the Wiki:
  http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If
  yes, add them  to the Wiki (need to ask permission to edit Wiki).
 
  Are those two listed clients dead or alive? Do they work with Solr
  4.7.1? Can you make them work with Solr 4.7.1 and recent version of
  Perl? Can you do a small demo that uses Perl client to index some
  geospatial information and then do a search for it?
 
  I strongly suspect you will hit some interesting issues. Find the fix,
  contribute back to the Perl library maintainer. Or, at least, clearly
  describe the issue, if you don't yet know enough to contribute the
  fix.
 
  Regards,
 Alex.
 
 
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com
  wrote:
   Okay sir i will mail to solr-user only, I am feeling so thankful to
 you
  for
   all you help, i am java developer with a good knowledge of perl,
 working
  on
   solr, actually just started working on solr for the geospatial
 search(not
   using JTS) only, To be very frank I learned about faceting from Mr
  Yonik's
   tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats
  all.
   What is your suggestion now and yesterday i suscribed for solr-start
 as
   well. And sir what do you mean by *Create a basic project using that
   library and latest version of Solr.*
  
   With Regards
   Aman Tandon
  
  
   On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch
   arafa...@gmail.comwrote:
  
   Hi Aman,
  
   Nice of you to want to help. Let's keep the discussion in the user
   mailing list as opposed to the developer one (most of the people are
   on both).
  
   What is your skill set? Are you familiar with particular languages?
 If
   so, the easiest way to contribute would be the following:
   1) Find all the solr client libraries in the language you are most
   familiar with (PHP, Java, Perl, Python, etc)
   2) Create a basic project using that library and latest version of
   Solr. Maybe using Solr tutorial as a baseline and showing how to do
   the same steps in the client instead of with command line/Curl.
   3) Write a blog post about what you learned, whether the library is
   supporting latest Solr well and whether it is supporting latest
   features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud)
  
   If that does not appeal, give an example of where your skills are
   strongest and I am sure there is a way for you to contribute.
  
   Regards,
  Alex.
  
   Personal website: http://www.outerthoughts.com/
   Current project: http://www.solr-start.com/ - Accelerating your Solr
   proficiency
  
  
   On Thu, Apr 10, 2014 at 12:36 PM, Aman Tandon 
 amantandon...@gmail.com
   wrote:
Can anybody please explain me that how should i start contributing
 to
   solr,
i am novice here as well in this technology as well, but i am
 learning
   solr
day by day.
So how should i start ?
   
Thanks
Aman Tandon

Re: Tomcat creates a thread for each SOLR core

Thanks for the quick responses,
I have allocated 1GB min and 6 GB max memory to Java. The cache settings
are the default ones (maybe this is a good point to start).
All cores share the same schema and config.
I'll try setting the
loadOnStartup=*false* transient=*true *options for each core and see what
will happen.

Those are the exceptions from the log files:
SEVERE: Servlet.service() for servlet [default] in context with path
[/solrt] threw exception
java.lang.IllegalStateException: Cannot call sendError() after the response
has been committed
at
org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

AND

SEVERE: null:ClientAbortException:  java.net.SocketException: Software
caused connection abort: socket write error
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371)
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333)
at
org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101)
at sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
at sun.nio.cs.StreamEncoder.flush(Unknown Source)
at java.io.OutputStreamWriter.flush(Unknown Source)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketException: Software caused connection abort:
socket write error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(Unknown Source)
at java.net.SocketOutputStream.write(Unknown Source)
at
org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)
at
org.apache.coyote.http11.InternalOutputBuffer.flush(InternalOutputBuffer.java:119)
at
org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Processor.java:799)
at org.apache.coyote.Response.action(Response.java:174)
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:366)
... 24 more



On Thu, Apr 10, 2014 at 9:51 AM, Alexandre Rafalovitch

Re: Tomcat creates a thread for each SOLR core

On Thu, Apr 10, 2014 at 2:14 PM, Atanas Atanasov atanaso...@gmail.com wrote:
 SEVERE: null:ClientAbortException:  java.net.SocketException: Software
 caused connection abort: socket write error

Separate issue, but most likely the client closed the browser and the
server had nowhere to send the respond too. So, it complaint. Happens
if your serving process is too slow.

The other one might be the same or might be different. The server
sends headers and expects the body to follow. Then, during processing
of the body, an error occurs. The server changes its mind and wants to
send an error (e.g. HTTP 500 instead of HTTP 200), but it's too late.
The headers were already sent out. So, it complains to the log file
instead. The real question is not this exception, but the internal
error that caused the server to change its mind.

I would concentrate on speed first and see if these problems go away.

Regards,
   Alex.


Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

Re: Solr special characters like '(' and ''?

2014-04-10 Thread rulinma

mark.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-special-characters-like-and-tp4129854p4130333.html
Sent from the Solr - User mailing list archive at Nabble.com.

Getting huge difference in QTime for terms.lower and terms.prefix

2014-04-10 Thread Jilani Shaik

Hi,

When I queried terms component with a terms.prefix the QTime for it is
100 milli seconds, where as the same query I am giving with terms.lower
then the QTime is  500 milliseconds. I am using the Solr Cloud.

I am giving both the cases terms.limit as 60 and terms.sort=index.

Query1 Params:
terms.fl=field_Nameterms.limit=60terms.prefix=bwt=jsonterms.sort=indexshard.keys=shard_key
QTime: 100 milli seconds


Query2 Params:
terms.fl=field_Nameterms.limit=60terms.lower=bwt=jsonterms.sort=indexshard.keys=shard_key
QTime: 500 milliseconds


The response is giving the same terms in both queries, But the QTime is
different.


Please let me know why is the difference in QTime for both approaches.


Thanks,
Jilani

Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-10 Thread Dmitry Kan

Thanks for responding, Wolfgang.

Changing to LUCENE_43:

IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_43,
null);

didn't affect on the index format version, because, I believe, if the
format of the index to merge has been of higher version (4.1 in this case),
it will merge to the same and not lower version (4.0). But format version
certainly could be read from the solrconfig, you are right.

Dmitry

On Wed, Apr 9, 2014 at 11:51 PM, Wolfgang Hoschek whosc...@cloudera.comwrote:

There is a current limitation in that the code doesn't actually look into
solrconfig.xml for the version. We should fix this, indeed. See

https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/TreeMergeOutputFormat.java#L100-101

Wolfgang.

On Apr 8, 2014, at 11:49 AM, Dmitry Kan solrexp...@gmail.com wrote:

Hello,

When we instantiate the MapReduceIndexerTool with the collections' conf
directory, we expect, that the Lucene version is respected and the index
gets generated in a format compatible with the defined version.

This does not seem to happen, however.

Checking with luke:

the expected Lucene index format: Lucene 4.0
the output Lucene index format: Lucene 4.1

Can anybody shed some light onto the semantics behind specifying the
Lucene
version in this context? Does this have something to do with what version
of solr core is used by the morphline library?

Thanks,

Dmitry

-- Forwarded message --

Dear list,

We have been generating solr indices with the solr-hadoop contrib module
(SOLR-1301). Our current solr in use is of 4.3.1 version. Is there any
tool
that could do the backward conversion, i.e. 4.7-4.3.1? Or is the upgrade
the only way to go?

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Shared Stored Field

2014-04-10 Thread StrW_dev

Hello,

We have a denormalized index where certain documents point in essence to the
same content. 
The relevance of the documents depends on the current context. E.g. document
A has a different boost factor when we apply filter F1 compared to when we
use filter F2 (or F3, etc).
To support this we denormalize document A with a unique boost field, such
that for each filter he can be found in he has a different relevance.

The problem is that the documents have a big stored content that is required
for the highlighting snippets.

This denormalization grows the index size with a factor 100 in worse case.
Storing the same big content field a lot of times times seems really
inefficient. 
Is there a way to point a group of documents to the same stored content
fields? 
Or is there a different way to influence the relevance depending on the
current search context?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351.html
Sent from the Solr - User mailing list archive at Nabble.com.

Fails to index if unique field has special characters

2014-04-10 Thread Cool Techi

Hi,
We are migrating from Solr 4.6 standalone to Solr 4.7 cloud version, while 
reindexing the document we are getting the following error. This is happening 
when the unique key has special character, this was not noticed in version 4.6 
standalone mode, so we are not sure if this is a version problem or a cloud 
issue. Example of the unique key is given below,
http://www.mynews.in/Blog/smrity!!**)))!miami_dolphins_vs_dallas_cowboys_live_stream_on_line_nfl_football_free_video_broadcast_B142707.html
Exception Stack Trace
ERROR - 2014-04-10 10:51:44.361; org.apache.solr.common.SolrException; 
java.lang.ArrayIndexOutOfBoundsException: 2   at 
org.apache.solr.common.cloud.CompositeIdRouter$KeyParser.getHash(CompositeIdRouter.java:296)
   at 
org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRouter.java:58)
   at 
org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:33)
   at 
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:218)
   at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:550)
   at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
   at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)  
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)   at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
   at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) 
  at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
   at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
   at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)  
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)   
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) 
  at org.eclipse.jetty.server.session.SessionHandle

Thanks,Ayush

Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-10 Thread Dmitry Kan

a correction: actually when I tested the above change I had so little data,
that it didn't trigger sub-shard slicing and thus merging of the slices.
Still, looks as if somewhere in the map-reduce contrib code there is a
link to what lucene version to use.

Wolfgang, do you happen to know where that other Version.* is specified?

On Thu, Apr 10, 2014 at 12:59 PM, Dmitry Kan solrexp...@gmail.com wrote:

Thanks for responding, Wolfgang.

Changing to LUCENE_43:

IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_43,
null);

Dmitry

On Wed, Apr 9, 2014 at 11:51 PM, Wolfgang Hoschek
whosc...@cloudera.comwrote:

There is a current limitation in that the code doesn't actually look into
solrconfig.xml for the version. We should fix this, indeed. See

https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/TreeMergeOutputFormat.java#L100-101

Wolfgang.

On Apr 8, 2014, at 11:49 AM, Dmitry Kan solrexp...@gmail.com wrote:

Hello,

This does not seem to happen, however.

Checking with luke:

the expected Lucene index format: Lucene 4.0
the output Lucene index format: Lucene 4.1

Can anybody shed some light onto the semantics behind specifying the
Lucene
version in this context? Does this have something to do with what
version
of solr core is used by the morphline library?

Thanks,

Dmitry

-- Forwarded message --

Dear list,

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: best way to contribute solr??

Aman:

Here's another helpful resource:

http://wiki.apache.org/solr/HowToContribute

It tells you how to get the source code, set up an IDE etc. for Solr/Lucene

In addition to Alexandre's suggestions, one possibility (but I warn
you it can be challenging) is to create unit tests. Part of the build
report each night has a coverage, you can get to the latest build
here:

https://wiki.apache.org/solr/NightlyBuilds

click on clover test coverage and pick something, track down what
isn't covered (see the clover report link for instance).

Warning: You will be completely lost for a while. This is hard stuff
when you're just starting out especially. So choose the simplest thing
you can for the first go to get familiar with the process if you want
to try this.

Another place to start is...the user's list. Pick one question a day,
research it and try to provide an answer. Clearly label your responses
with the degree of certainty you have. Another caution: you'll
research something and get back to the list to discover its already
been answered sometimes but you'll have gained the knowledge and
it gets better over time.

Best,
Erick

On Thu, Apr 10, 2014 at 12:03 AM, Aman Tandon amantandon...@gmail.com wrote:
 Thanks sir, I will look into this.
 Solr and its developer are all helpful and awesome, i am feeling great.

 Thanks
 Aman Tandon


 On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

 Sure, you can do it in Java too. The difference is that Solr comes
 with Java client SolrJ which is tested and kept up-to-date. But there
 could still be more tutorials.

 For other languages/clients, there is a lot less information
 available.  Especially, if you start adding (human) languages into it.
 E.g. how to process your own language (if non-English).

 And there are many more ideas on Slide 26 of
 http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup
 . As well as an example of processing pipeline for Thai. More of these
 kinds of things would be useful too.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com
 wrote:
  Thank you so much sir :)
 
  Can i try in java as well?
 
  Thanks
  Aman Tandon
 
 
  On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
  Great, Solr + Perl + Geospatial.
 
  There are two Perl clients for Solr listed on the Wiki:
  http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If
  yes, add them  to the Wiki (need to ask permission to edit Wiki).
 
  Are those two listed clients dead or alive? Do they work with Solr
  4.7.1? Can you make them work with Solr 4.7.1 and recent version of
  Perl? Can you do a small demo that uses Perl client to index some
  geospatial information and then do a search for it?
 
  I strongly suspect you will hit some interesting issues. Find the fix,
  contribute back to the Perl library maintainer. Or, at least, clearly
  describe the issue, if you don't yet know enough to contribute the
  fix.
 
  Regards,
 Alex.
 
 
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon amantandon...@gmail.com
  wrote:
   Okay sir i will mail to solr-user only, I am feeling so thankful to
 you
  for
   all you help, i am java developer with a good knowledge of perl,
 working
  on
   solr, actually just started working on solr for the geospatial
 search(not
   using JTS) only, To be very frank I learned about faceting from Mr
  Yonik's
   tutorial, geospatial(not JTS), indexing ,searching and boosting. Thats
  all.
   What is your suggestion now and yesterday i suscribed for solr-start
 as
   well. And sir what do you mean by *Create a basic project using that
   library and latest version of Solr.*
  
   With Regards
   Aman Tandon
  
  
   On Thu, Apr 10, 2014 at 11:14 AM, Alexandre Rafalovitch
   arafa...@gmail.comwrote:
  
   Hi Aman,
  
   Nice of you to want to help. Let's keep the discussion in the user
   mailing list as opposed to the developer one (most of the people are
   on both).
  
   What is your skill set? Are you familiar with particular languages?
 If
   so, the easiest way to contribute would be the following:
   1) Find all the solr client libraries in the language you are most
   familiar with (PHP, Java, Perl, Python, etc)
   2) Create a basic project using that library and latest version of
   Solr. Maybe using Solr tutorial as a baseline and showing how to do
   the same steps in the client instead of with command line/Curl.
   3) Write a blog post about what you learned, whether the library is
   supporting latest Solr well and whether it is supporting latest
   features of Solr (e.g. Schemaless mode, Near-Real-Time, SolrCloud)
  
   If that does not

Re: Tomcat creates a thread for each SOLR core

Trying to fit 1,000 cores in 6G of memory is... interesting. That's a
lot of stuff in a small amount of memory. I hope these cores' indexes
are tiny.

The lazy-loading bit for cores has a price. The first user in will pay
the warmup penalty for that core while it loads. This may or may not
be noticeable but be aware of it. You may or may not want autowarming
in place.

You can also specify how many cores are kept in memory at one time,
they go into an LRU cache and are aged out after they serve their last
outstanding request.

BTW, current Java practice seems to be setting Xmx and Xms to the same
value, 6G in your case.

Good Luck!
Erick

On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov atanaso...@gmail.com wrote:
 Thanks for the quick responses,
 I have allocated 1GB min and 6 GB max memory to Java. The cache settings
 are the default ones (maybe this is a good point to start).
 All cores share the same schema and config.
 I'll try setting the
 loadOnStartup=*false* transient=*true *options for each core and see what
 will happen.

 Those are the exceptions from the log files:
 SEVERE: Servlet.service() for servlet [default] in context with path
 [/solrt] threw exception
 java.lang.IllegalStateException: Cannot call sendError() after the response
 has been committed
 at
 org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450)
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
 at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
 at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)

 AND

 SEVERE: null:ClientAbortException:  java.net.SocketException: Software
 caused connection abort: socket write error
 at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371)
 at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333)
 at
 org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101)
 at sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
 at sun.nio.cs.StreamEncoder.flush(Unknown Source)
 at java.io.OutputStreamWriter.flush(Unknown Source)
 at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
 at
 org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
 at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
 at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused

Re: Getting huge difference in QTime for terms.lower and terms.prefix

2014-04-10 Thread Jilani Shaik

Please provide suggestions what could be the reason for this.

Thanks,


On Thu, Apr 10, 2014 at 2:54 PM, Jilani Shaik jilani24...@gmail.com wrote:

 Hi,

 When I queried terms component with a terms.prefix the QTime for it is
 100 milli seconds, where as the same query I am giving with terms.lower
 then the QTime is  500 milliseconds. I am using the Solr Cloud.

 I am giving both the cases terms.limit as 60 and terms.sort=index.

 Query1 Params:

 terms.fl=field_Nameterms.limit=60terms.prefix=bwt=jsonterms.sort=indexshard.keys=shard_key
 QTime: 100 milli seconds


 Query2 Params:

 terms.fl=field_Nameterms.limit=60terms.lower=bwt=jsonterms.sort=indexshard.keys=shard_key
 QTime: 500 milliseconds


 The response is giving the same terms in both queries, But the QTime is
 different.


 Please let me know why is the difference in QTime for both approaches.


 Thanks,
 Jilani

Re: Tomcat creates a thread for each SOLR core

Thanks for the tip,

I already set the core properties. Now tomcat has only 27 threads after
start up, which is awesome.
Works fine, first search is not noticeably slower than before.
I'll put equal values for Xmx and Xms and see if there will be any
difference.

Regards,
Atanas


On Thu, Apr 10, 2014 at 5:11 PM, Erick Erickson erickerick...@gmail.comwrote:

 Trying to fit 1,000 cores in 6G of memory is... interesting. That's a
 lot of stuff in a small amount of memory. I hope these cores' indexes
 are tiny.

 The lazy-loading bit for cores has a price. The first user in will pay
 the warmup penalty for that core while it loads. This may or may not
 be noticeable but be aware of it. You may or may not want autowarming
 in place.

 You can also specify how many cores are kept in memory at one time,
 they go into an LRU cache and are aged out after they serve their last
 outstanding request.

 BTW, current Java practice seems to be setting Xmx and Xms to the same
 value, 6G in your case.

 Good Luck!
 Erick

 On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov atanaso...@gmail.com
 wrote:
  Thanks for the quick responses,
  I have allocated 1GB min and 6 GB max memory to Java. The cache settings
  are the default ones (maybe this is a good point to start).
  All cores share the same schema and config.
  I'll try setting the
  loadOnStartup=*false* transient=*true *options for each core and see what
  will happen.
 
  Those are the exceptions from the log files:
  SEVERE: Servlet.service() for servlet [default] in context with path
  [/solrt] threw exception
  java.lang.IllegalStateException: Cannot call sendError() after the
 response
  has been committed
  at
 
 org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
  at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
  at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
  at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
  at
 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
  at
 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
  at
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
 
  AND
 
  SEVERE: null:ClientAbortException:  java.net.SocketException: Software
  caused connection abort: socket write error
  at
 org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371)
  at
 org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333)
  at
 
 org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101)
  at sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
  at sun.nio.cs.StreamEncoder.flush(Unknown Source)
  at java.io.OutputStreamWriter.flush(Unknown Source)
  at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
  at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
  at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
  at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
  at

Re: Shared Stored Field

Hmmm, I scanned your question, so maybe I missed something. It sounds
like you have a fixed number of filters known at index time, right? So
why not index these boosts in separate fields in the document (e.g.
f1_boost, f2_boost etc) and use a function query
(https://cwiki.apache.org/confluence/display/solr/Function+Queries) at
query time to boost by the correct one?

Of course I may be wy off base here, but

BTW, you could use dynamic fields to not have to pre-define the
maximum number of boost fields, something like this in my example:
dynamicField name=*_boost  type=float  indexed=true  stored=false/

Best
Erick

On Thu, Apr 10, 2014 at 4:30 AM, StrW_dev r.j.bamb...@structweb.nl wrote:
 Hello,

 We have a denormalized index where certain documents point in essence to the
 same content.
 The relevance of the documents depends on the current context. E.g. document
 A has a different boost factor when we apply filter F1 compared to when we
 use filter F2 (or F3, etc).
 To support this we denormalize document A with a unique boost field, such
 that for each filter he can be found in he has a different relevance.

 The problem is that the documents have a big stored content that is required
 for the highlighting snippets.

 This denormalization grows the index size with a factor 100 in worse case.
 Storing the same big content field a lot of times times seems really
 inefficient.
 Is there a way to point a group of documents to the same stored content
 fields?
 Or is there a different way to influence the relevance depending on the
 current search context?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: best way to contribute solr??

thanks sir, i always smile when people here are always ready for help, i am
thankful to all, and yes i started learning by reading daily at least 50-60
mails to increase my knowledge gave my suggestion if i am familiar with it,
people here correct me as well if i am wrong. I know it will take time but
someday i will contribute as well and thanks for setup it will be quite
helpful. In my office i am using solr 4.2 with tomcat right now i am
stucked because i don't know how to integrate solr 4.7 with my tomcat,
because the problem for me is that i am familiar with the cores
architecture of solr 4.2 in which we defined the every core name as well as
instanceDir but not with solr 4.7.

Thanks
Aman Tandon


On Thu, Apr 10, 2014 at 7:31 PM, Erick Erickson erickerick...@gmail.comwrote:

 Aman:

 Here's another helpful resource:

 http://wiki.apache.org/solr/HowToContribute

 It tells you how to get the source code, set up an IDE etc. for Solr/Lucene

 In addition to Alexandre's suggestions, one possibility (but I warn
 you it can be challenging) is to create unit tests. Part of the build
 report each night has a coverage, you can get to the latest build
 here:

 https://wiki.apache.org/solr/NightlyBuilds

 click on clover test coverage and pick something, track down what
 isn't covered (see the clover report link for instance).

 Warning: You will be completely lost for a while. This is hard stuff
 when you're just starting out especially. So choose the simplest thing
 you can for the first go to get familiar with the process if you want
 to try this.

 Another place to start is...the user's list. Pick one question a day,
 research it and try to provide an answer. Clearly label your responses
 with the degree of certainty you have. Another caution: you'll
 research something and get back to the list to discover its already
 been answered sometimes but you'll have gained the knowledge and
 it gets better over time.

 Best,
 Erick

 On Thu, Apr 10, 2014 at 12:03 AM, Aman Tandon amantandon...@gmail.com
 wrote:
  Thanks sir, I will look into this.
  Solr and its developer are all helpful and awesome, i am feeling great.
 
  Thanks
  Aman Tandon
 
 
  On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
  Sure, you can do it in Java too. The difference is that Solr comes
  with Java client SolrJ which is tested and kept up-to-date. But there
  could still be more tutorials.
 
  For other languages/clients, there is a lot less information
  available.  Especially, if you start adding (human) languages into it.
  E.g. how to process your own language (if non-English).
 
  And there are many more ideas on Slide 26 of
 
 http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup
  . As well as an example of processing pipeline for Thai. More of these
  kinds of things would be useful too.
 
  Regards,
 Alex.
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com
  wrote:
   Thank you so much sir :)
  
   Can i try in java as well?
  
   Thanks
   Aman Tandon
  
  
   On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch
   arafa...@gmail.comwrote:
  
   Great, Solr + Perl + Geospatial.
  
   There are two Perl clients for Solr listed on the Wiki:
   http://wiki.apache.org/solr/IntegratingSolr . Are there any more? If
   yes, add them  to the Wiki (need to ask permission to edit Wiki).
  
   Are those two listed clients dead or alive? Do they work with Solr
   4.7.1? Can you make them work with Solr 4.7.1 and recent version of
   Perl? Can you do a small demo that uses Perl client to index some
   geospatial information and then do a search for it?
  
   I strongly suspect you will hit some interesting issues. Find the
 fix,
   contribute back to the Perl library maintainer. Or, at least, clearly
   describe the issue, if you don't yet know enough to contribute the
   fix.
  
   Regards,
  Alex.
  
  
   Personal website: http://www.outerthoughts.com/
   Current project: http://www.solr-start.com/ - Accelerating your Solr
   proficiency
  
  
   On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon 
 amantandon...@gmail.com
   wrote:
Okay sir i will mail to solr-user only, I am feeling so thankful to
  you
   for
all you help, i am java developer with a good knowledge of perl,
  working
   on
solr, actually just started working on solr for the geospatial
  search(not
using JTS) only, To be very frank I learned about faceting from Mr
   Yonik's
tutorial, geospatial(not JTS), indexing ,searching and boosting.
 Thats
   all.
What is your suggestion now and yesterday i suscribed for
 solr-start
  as
well. And sir what do you mean by *Create a basic project using
 that
library and latest version of Solr.*
   
With Regards
Aman Tandon
   
   
On Thu, Apr 10, 2014 at 11:14 AM,

Re: Tomcat creates a thread for each SOLR core

I don't expect having equal values to make a noticeable difference,
except possibly in some corner cases. Setting them equal is mostly for
avoiding surprises...

Erick

On Thu, Apr 10, 2014 at 7:17 AM, Atanas Atanasov atanaso...@gmail.com wrote:
 Thanks for the tip,

 I already set the core properties. Now tomcat has only 27 threads after
 start up, which is awesome.
 Works fine, first search is not noticeably slower than before.
 I'll put equal values for Xmx and Xms and see if there will be any
 difference.

 Regards,
 Atanas


 On Thu, Apr 10, 2014 at 5:11 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Trying to fit 1,000 cores in 6G of memory is... interesting. That's a
 lot of stuff in a small amount of memory. I hope these cores' indexes
 are tiny.

 The lazy-loading bit for cores has a price. The first user in will pay
 the warmup penalty for that core while it loads. This may or may not
 be noticeable but be aware of it. You may or may not want autowarming
 in place.

 You can also specify how many cores are kept in memory at one time,
 they go into an LRU cache and are aged out after they serve their last
 outstanding request.

 BTW, current Java practice seems to be setting Xmx and Xms to the same
 value, 6G in your case.

 Good Luck!
 Erick

 On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov atanaso...@gmail.com
 wrote:
  Thanks for the quick responses,
  I have allocated 1GB min and 6 GB max memory to Java. The cache settings
  are the default ones (maybe this is a good point to start).
  All cores share the same schema and config.
  I'll try setting the
  loadOnStartup=*false* transient=*true *options for each core and see what
  will happen.
 
  Those are the exceptions from the log files:
  SEVERE: Servlet.service() for servlet [default] in context with path
  [/solrt] threw exception
  java.lang.IllegalStateException: Cannot call sendError() after the
 response
  has been committed
  at
 
 org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
  at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
  at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
  at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
  at
 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
  at
 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
  at
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
 
  AND
 
  SEVERE: null:ClientAbortException:  java.net.SocketException: Software
  caused connection abort: socket write error
  at
 org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371)
  at
 org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333)
  at
 
 org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101)
  at sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
  at sun.nio.cs.StreamEncoder.flush(Unknown Source)
  at java.io.OutputStreamWriter.flush(Unknown Source)
  at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
  at

Re: Tomcat creates a thread for each SOLR core

Hi Atanas,

I have a question that, how do know that how much thread does the tomcat
has?

Thanks
Aman Tandon


On Thu, Apr 10, 2014 at 7:53 PM, Erick Erickson erickerick...@gmail.comwrote:

 I don't expect having equal values to make a noticeable difference,
 except possibly in some corner cases. Setting them equal is mostly for
 avoiding surprises...

 Erick

 On Thu, Apr 10, 2014 at 7:17 AM, Atanas Atanasov atanaso...@gmail.com
 wrote:
  Thanks for the tip,
 
  I already set the core properties. Now tomcat has only 27 threads after
  start up, which is awesome.
  Works fine, first search is not noticeably slower than before.
  I'll put equal values for Xmx and Xms and see if there will be any
  difference.
 
  Regards,
  Atanas
 
 
  On Thu, Apr 10, 2014 at 5:11 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Trying to fit 1,000 cores in 6G of memory is... interesting. That's a
  lot of stuff in a small amount of memory. I hope these cores' indexes
  are tiny.
 
  The lazy-loading bit for cores has a price. The first user in will pay
  the warmup penalty for that core while it loads. This may or may not
  be noticeable but be aware of it. You may or may not want autowarming
  in place.
 
  You can also specify how many cores are kept in memory at one time,
  they go into an LRU cache and are aged out after they serve their last
  outstanding request.
 
  BTW, current Java practice seems to be setting Xmx and Xms to the same
  value, 6G in your case.
 
  Good Luck!
  Erick
 
  On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov atanaso...@gmail.com
 
  wrote:
   Thanks for the quick responses,
   I have allocated 1GB min and 6 GB max memory to Java. The cache
 settings
   are the default ones (maybe this is a good point to start).
   All cores share the same schema and config.
   I'll try setting the
   loadOnStartup=*false* transient=*true *options for each core and see
 what
   will happen.
  
   Those are the exceptions from the log files:
   SEVERE: Servlet.service() for servlet [default] in context with path
   [/solrt] threw exception
   java.lang.IllegalStateException: Cannot call sendError() after the
  response
   has been committed
   at
  
 
 org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
   at
  
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
   at
  
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
   at
  
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
   at
  
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
   at
  
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
   at
  
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at
 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
   at
  
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
   at
  
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
   at
  
 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
   at
  
 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
   at
  
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
  
   AND
  
   SEVERE: null:ClientAbortException:  java.net.SocketException: Software
   caused connection abort: socket write error
   at
 
 org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371)
   at
  org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333)
   at
  
 
 org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101)
   at sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
   at sun.nio.cs.StreamEncoder.flush(Unknown Source)
   at java.io.OutputStreamWriter.flush(Unknown Source)
   at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
   at
  
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
   at

Re: Tomcat creates a thread for each SOLR core

Hi,

I see the threads of the tomcat7.exe process in the Windows Task manager.

Regards,
Atanas Atanasov


On Thu, Apr 10, 2014 at 5:28 PM, Aman Tandon amantandon...@gmail.comwrote:

 Hi Atanas,

 I have a question that, how do know that how much thread does the tomcat
 has?

 Thanks
 Aman Tandon


 On Thu, Apr 10, 2014 at 7:53 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  I don't expect having equal values to make a noticeable difference,
  except possibly in some corner cases. Setting them equal is mostly for
  avoiding surprises...
 
  Erick
 
  On Thu, Apr 10, 2014 at 7:17 AM, Atanas Atanasov atanaso...@gmail.com
  wrote:
   Thanks for the tip,
  
   I already set the core properties. Now tomcat has only 27 threads after
   start up, which is awesome.
   Works fine, first search is not noticeably slower than before.
   I'll put equal values for Xmx and Xms and see if there will be any
   difference.
  
   Regards,
   Atanas
  
  
   On Thu, Apr 10, 2014 at 5:11 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
  
   Trying to fit 1,000 cores in 6G of memory is... interesting. That's a
   lot of stuff in a small amount of memory. I hope these cores' indexes
   are tiny.
  
   The lazy-loading bit for cores has a price. The first user in will pay
   the warmup penalty for that core while it loads. This may or may not
   be noticeable but be aware of it. You may or may not want autowarming
   in place.
  
   You can also specify how many cores are kept in memory at one time,
   they go into an LRU cache and are aged out after they serve their last
   outstanding request.
  
   BTW, current Java practice seems to be setting Xmx and Xms to the same
   value, 6G in your case.
  
   Good Luck!
   Erick
  
   On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov 
 atanaso...@gmail.com
  
   wrote:
Thanks for the quick responses,
I have allocated 1GB min and 6 GB max memory to Java. The cache
  settings
are the default ones (maybe this is a good point to start).
All cores share the same schema and config.
I'll try setting the
loadOnStartup=*false* transient=*true *options for each core and see
  what
will happen.
   
Those are the exceptions from the log files:
SEVERE: Servlet.service() for servlet [default] in context with path
[/solrt] threw exception
java.lang.IllegalStateException: Cannot call sendError() after the
   response
has been committed
at
   
  
 
 org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450)
at
   
  
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695)
at
   
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
at
   
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
   
  
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
   
  
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
   
  
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
   
  
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
   
  
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at
   
  
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
  
  org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at
   
  
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
   
  
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
   
  
 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at
   
  
 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
   
  
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
at java.lang.Thread.run(Unknown Source)
   
AND
   
SEVERE: null:ClientAbortException:  java.net.SocketException:
 Software
caused connection abort: socket write error
at
  
  org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371)
at
  
 org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333)
at
   
  
 
 org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101)
at sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
at sun.nio.cs.StreamEncoder.flush(Unknown Source)
at java.io.OutputStreamWriter.flush(Unknown Source)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
at
   
  
 
 org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648)
at

Re: Tomcat creates a thread for each SOLR core

OKay okay i am a centos user in office, windows at home :D

Thanks
Aman Tandon


On Thu, Apr 10, 2014 at 8:01 PM, Atanas Atanasov atanaso...@gmail.comwrote:

 Hi,

 I see the threads of the tomcat7.exe process in the Windows Task manager.

 Regards,
 Atanas Atanasov


 On Thu, Apr 10, 2014 at 5:28 PM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi Atanas,
 
  I have a question that, how do know that how much thread does the tomcat
  has?
 
  Thanks
  Aman Tandon
 
 
  On Thu, Apr 10, 2014 at 7:53 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   I don't expect having equal values to make a noticeable difference,
   except possibly in some corner cases. Setting them equal is mostly for
   avoiding surprises...
  
   Erick
  
   On Thu, Apr 10, 2014 at 7:17 AM, Atanas Atanasov atanaso...@gmail.com
 
   wrote:
Thanks for the tip,
   
I already set the core properties. Now tomcat has only 27 threads
 after
start up, which is awesome.
Works fine, first search is not noticeably slower than before.
I'll put equal values for Xmx and Xms and see if there will be any
difference.
   
Regards,
Atanas
   
   
On Thu, Apr 10, 2014 at 5:11 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
   
Trying to fit 1,000 cores in 6G of memory is... interesting. That's
 a
lot of stuff in a small amount of memory. I hope these cores'
 indexes
are tiny.
   
The lazy-loading bit for cores has a price. The first user in will
 pay
the warmup penalty for that core while it loads. This may or may not
be noticeable but be aware of it. You may or may not want
 autowarming
in place.
   
You can also specify how many cores are kept in memory at one time,
they go into an LRU cache and are aged out after they serve their
 last
outstanding request.
   
BTW, current Java practice seems to be setting Xmx and Xms to the
 same
value, 6G in your case.
   
Good Luck!
Erick
   
On Thu, Apr 10, 2014 at 12:14 AM, Atanas Atanasov 
  atanaso...@gmail.com
   
wrote:
 Thanks for the quick responses,
 I have allocated 1GB min and 6 GB max memory to Java. The cache
   settings
 are the default ones (maybe this is a good point to start).
 All cores share the same schema and config.
 I'll try setting the
 loadOnStartup=*false* transient=*true *options for each core and
 see
   what
 will happen.

 Those are the exceptions from the log files:
 SEVERE: Servlet.service() for servlet [default] in context with
 path
 [/solrt] threw exception
 java.lang.IllegalStateException: Cannot call sendError() after the
response
 has been committed
 at

   
  
 
 org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:450)
 at

   
  
 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695)
 at

   
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
 at

   
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at

   
  
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
 at

   
  
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
 at

   
  
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
 at

   
  
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
 at

   
  
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
 at

   
  
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
 at
   
  
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
 at

   
  
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
 at

   
  
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
 at

   
  
 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
 at

   
  
 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
 at

   
  
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:315)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
  Source)
 at java.lang.Thread.run(Unknown Source)

 AND

 SEVERE: null:ClientAbortException:  java.net.SocketException:
  Software
 caused connection abort: socket write error
 at
   
  
 org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371)
 at
   
  org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333)
 at

Re: Shared Stored Field

2014-04-10 Thread StrW_dev

Erick Erickson wrote
  
 So why not index these boosts in separate fields in the document (e.g.
 f1_boost, f2_boost etc) and use a function query
 (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at
 query time to boost by the correct one?

Well its basically one multivalued field that can have unlimited values and
has multiple per document (on average like 8). In that case we should add a
boost field for each of the values in the document, in general we would get
unlimited amount of dynamic fields in the index. 

But it is possible to select a different boost field depending on the
current filter query? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130399.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Shared Stored Field

bq: But it is possible to select a different boost field depending on
the current filter query?

Well, you're constructing the URL somewhere, you can choose the right
boost there can't you?

I don't understand this bit:
Well its basically one multivalued field that can have unlimited
values and has multiple per document (on average like 8)

The _values_ aren't at issue, it's just the name of the field. You can
have lots of dynamic fields defined in your documents and it's not too
expensive. Don't go wild here, when you get up into the hundreds maybe
you should think about it a bit.

I feel I'm missing something, some concrete examples would help a lot.

Best,
Erick

On Thu, Apr 10, 2014 at 7:33 AM, StrW_dev r.j.bamb...@structweb.nl wrote:
 Erick Erickson wrote

 So why not index these boosts in separate fields in the document (e.g.
 f1_boost, f2_boost etc) and use a function query
 (https://cwiki.apache.org/confluence/display/solr/Function+Queries) at
 query time to boost by the correct one?

 Well its basically one multivalued field that can have unlimited values and
 has multiple per document (on average like 8). In that case we should add a
 boost field for each of the values in the document, in general we would get
 unlimited amount of dynamic fields in the index.

 But it is possible to select a different boost field depending on the
 current filter query?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130399.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tomcat creates a thread for each SOLR core

On 4/10/2014 12:40 AM, Atanas Atanasov wrote:
 I need some help. After updating to SOLR 4.4 the tomcat process is
 consuming about 2GBs of memory, the CPU usage is about 40% after the start
 for about 10 minutes. However, the bigger problem is, I have about 1000
 cores and seems that for each core a thread is created. The process has
 more than 1000 threads and everything is extremely slow. Creating or
 unloading a core even without documents takes about 20 minutes. Searching
 is more or less good, but storing also takes a lot.
 Is there some configuration I missed or that I did wrong? There aren't many
 calls, I use 64 bit tomcat 7, SOLR 4.4, latest 64 bit Java. The machine has
 24 GBs of RAM, a CPU with 16 cores and is running Windows Server 2008 R2.
 Index is uppdated every 30 seconds/10 000 documents.
 I haven't checked the number of threads before the update, because I didn't
 have to, it was working just fine. Any suggestion will be highly
 appreciated, thank you in advance.

If creating a core takes 20 minutes, that sounds to me like the JVM is
doing constant full garbage collections to free up enough memory for
basic system operation.  It could also be explained by temporary work
threads having to wait to execute because the servlet container will not
allow them to run.

When indexing is happening, each core will set aside some memory for
buffering index updates.  By default, the value of ramBufferSizeMB is
100.  If all your cores are indexing at once, multiply the indexing
buffer by 1000, and you'll require 100GB of heap memory.  You'll need to
greatly reduce that buffer size.  This buffer was 32MB by default in 4.0
and earlier.  If you are not setting this value, this change sounds like
it might fully explain what you are seeing.

https://issues.apache.org/jira/browse/SOLR-4074

What version did you upgrade from?  Solr 4.x is a very different beast
than earlier major versions.  I believe there may have been some changes
made to reduce memory usage in versions after 4.4.0.

The jetty that comes with Solr is configured to allow 10,000 threads.
Most people don't have that many, even on a temporary basis, but bad
things happen when the servlet container will not allow Solr to start as
many as it requires.  I believe that the typical default maxThreads
value you'll find in a servlet container config is 200.

Erick's right about a 6GB heap being very small for what you are trying
to do.  Putting 1000 cores on one machine is something I would never
try.  If it became a requirement I had to deal with, I wouldn't try it
unless the machine had a lot more CPU cores, hundreds of gigabytes of
RAM, and a lot of extremely fast disk space.

If this worked before a Solr upgrade, I'm amazed.  Congratulations to
you for fine work!

NB: Oracle Java 7u25 is what you should be using.  7u40 through 7u51
have known bugs that affect Solr/Lucene.  These should be fixed by 7u60.
 A pre-release of that is available now, and it should be generally
available in May 2014.

Thanks,
Shawn

Re: Shared Stored Field

2014-04-10 Thread StrW_dev

Erick Erickson wrote
 Well, you're constructing the URL somewhere, you can choose the right 
 boost there can't you? 

Yes of course!

As example:
We have one filter field called FILTER which can have unlimited values acros
all documents.
Each document as on average 8 values set for FILTER (e.g. FILTER
[1,2,..,8]).
So we could add boost fields depending on each of these values as B_1:1.0,
...  ,B_7:5.0 for example and use that during query time. This is your
suggestions correct?

So each document has on average 8 of these dynamic fields, while over the
whole index we have unlimited of these fields. What would this mean for the
performance?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130411.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Were changes made to facetting on multivalued fields recently?

2014-04-10 Thread Jean-Sebastien Vachon

Here are the field definitions for both our old and new index... as you can see 
that are identical. We've been using this chain and field type starting with 
Solr 1.4 and never had any problem. As for the documents, both indexes are 
using the same data source. They could be slightly out of sync from time to 
time but we tend to index them on a daily basis. Both indexes are also using 
the same code (indexing through SolrJ) to index their content.

The source is a column in MySql that contains entries such as 4,1 that get 
stored in a Multivalued fields after replacing commas by spaces

OLD (4.6.1):
   fieldType name=text_ws class=solr.TextField positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

field name=ad_job_type_id type=text_ws indexed=true stored=true 
required=false multiValued=true /

NEW (4.7.1):

fieldType name=text_ws class=solr.TextField positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
 /fieldType

field name=ad_job_type_id type=text_ws indexed=true stored=true 
required=false multiValued=true /

It looks like the /analysis/field hanlder is not active in our current setup. I 
will look into this and perform additional checks later as we are currently 
doing a full reindex of our DB.

Thanks for your time

 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: April-09-14 5:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Were changes made to facetting on multivalued fields recently?
 
 On 4/9/2014 2:15 PM, Erick Erickson wrote:
  Right, but the response in the doc when you make a request is almost,
  but not quite totally, unrelated to how facet values are tallied. It's
  all about what tokens are actually in your index, which you can see in
  the schema browser...
 
 Supplement to what Erick has told you:
 
 SOLR-5512 seems to be related to facets using docValues. The commit for
 that issue looks like it only touches on that specifically.If you do not have
 (and never have had) docValues on this field, then SOLR-5512 should not
 apply.
 
 I am reasonably sure that for facets on fields with docValues, your facets
 would reflect the *stored* information, not the indexed information.
 
 Finally, I don't think that docValues work on fieldtypes whose class is
 solr.TextField, which is the only class that can have an analysis chain that
 would turn 4 5 1 into three separate tokens.  The response that you shared
 where the value is 4 5 1 looks like there is only one value in the field -- 
 so
 for that document, it is effectively the same as one that is single-valued.
 
 Bottom line: It looks like either your analysis chain is working differently 
 in
 the newer version, or you have documents in your newer index that are not
 in the older one.  Can you share the field and fieldType definitions from both
 versions?  Did your luceneMatchVersion change with the upgrade?  If you are
 using DIH to populate your index, can you also share your DIH config?
 
 Thanks,
 Shawn
 
 
 -
 Aucun virus trouvé dans ce message.
 Analyse effectuée par AVG - www.avg.fr
 Version: 2014.0.4354 / Base de données virale: 3722/7256 - Date:
 27/03/2014 La Base de données des virus a expiré.

Re: Facet search and growing memory usage

On 4/9/2014 11:53 PM, Toke Eskildsen wrote:
 This does not happen with the 'old' method 'facet.method=enum' - memory
 usage is stable and solr is unbreakable with my hold-reload test.
 
 The memory allocation for enum is both low and independent of the amount
 of unique values in the facets. The trade-off is that is is very slow
 for medium- to high-cardinality fields.

This is where it is extremely beneficial to have enough RAM to cache
your entire index.  The term list must be enumerated for every facet
request, but if the data is already in the OS disk cache, this is very
fast.  If the operating system has to read the data off the disk, it
will be *very* slow.

If facets are happening on lots of fields and are heavily utilized,
facet.method=enum should be used, and there must be plenty of RAM to
cache all or most of the index data on the machine.  The default method
(fc) will create the memory structure that Toke has mentioned for
*every* field that gets used for facets.  If there are only a few fields
used for faceting and they have low cardinality, this is not a problem,
and the speedup is usually worth the extra heap memory usage.  With 40
facets, that is not supportable.

Thanks,
Shawn

Re: Shared Stored Field

So you're saying that you have B_1 - B_8 in one doc, B_9 - B_16 in
another doc etc?

What's so confusing is that in your first e-mail, you said:
bq: This denormalization grows the index size with a factor 100 in worse case.

Which I took to mean you have at most 100 of these fields.

Please look at the function query page I referenced and try a few
things so we can deal with specific questions. You can put the results
of a _query_ in a function query, so you could probably just form a
sub-query that returns a score that you in turn use to boost the doc.

Best,
Erick

On Thu, Apr 10, 2014 at 8:04 AM, StrW_dev r.j.bamb...@structweb.nl wrote:
 Erick Erickson wrote
 Well, you're constructing the URL somewhere, you can choose the right
 boost there can't you?

 Yes of course!

 As example:
 We have one filter field called FILTER which can have unlimited values acros
 all documents.
 Each document as on average 8 values set for FILTER (e.g. FILTER
 [1,2,..,8]).
 So we could add boost fields depending on each of these values as B_1:1.0,
 ...  ,B_7:5.0 for example and use that during query time. This is your
 suggestions correct?

 So each document has on average 8 of these dynamic fields, while over the
 whole index we have unlimited of these fields. What would this mean for the
 performance?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130411.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Were changes made to facetting on multivalued fields recently?

On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote:
 Here are the field definitions for both our old and new index... as you can 
 see that are identical. We've been using this chain and field type starting 
 with Solr 1.4 and never had any problem. As for the documents, both indexes 
 are using the same data source. They could be slightly out of sync from time 
 to time but we tend to index them on a daily basis. Both indexes are also 
 using the same code (indexing through SolrJ) to index their content.
 
 The source is a column in MySql that contains entries such as 4,1 that get 
 stored in a Multivalued fields after replacing commas by spaces
 
 OLD (4.6.1):
fieldType name=text_ws class=solr.TextField 
 positionIncrementGap=100
   analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 
 field name=ad_job_type_id type=text_ws indexed=true stored=true 
 required=false multiValued=true /

Just so you know, there's nothing here that would require the field to
be multivalued.  WhitespaceTokenizerFactory does not create multiple
field values, it creates multiple terms.  If you are actually inserting
multiple values for the field in SolrJ, then you would need a
multivalued field.

What is replacing the commas with spaces?  I don't see anything here
that would do that.  It sounds like that part of your indexing is not
working.

Thanks,
Shawn

Re: Fails to index if unique field has special characters

2014-04-10 Thread Ahmet Arslan

Hi Ayush,

I thinks this 

IBM!12345. The exclamation mark ('!') is critical here, as it distinguishes 
the prefix used to determine which shard to direct the document to.

https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud




On Thursday, April 10, 2014 2:35 PM, Cool Techi cooltec...@outlook.com wrote:
Hi,
We are migrating from Solr 4.6 standalone to Solr 4.7 cloud version, while 
reindexing the document we are getting the following error. This is happening 
when the unique key has special character, this was not noticed in version 4.6 
standalone mode, so we are not sure if this is a version problem or a cloud 
issue. Example of the unique key is given below,
http://www.mynews.in/Blog/smrity!!**)))!miami_dolphins_vs_dallas_cowboys_live_stream_on_line_nfl_football_free_video_broadcast_B142707.html
Exception Stack Trace
ERROR - 2014-04-10 10:51:44.361; org.apache.solr.common.SolrException; 
java.lang.ArrayIndexOutOfBoundsException: 2       at 
org.apache.solr.common.cloud.CompositeIdRouter$KeyParser.getHash(CompositeIdRouter.java:296)
       at 
org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRouter.java:58)
       at 
org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:33)
       at 
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:218)
       at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:550)
       at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
       at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)      
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)       at
 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
       at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
       at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)       at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) 
      at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
       at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
       at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
       at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)      
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)   
    at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)    
   at org.eclipse.jetty.server.session.SessionHandle

Thanks,Ayush

RE: Facet search and growing memory usage

2014-04-10 Thread Toke Eskildsen

Shawn Heisey [s...@elyograg.org] wrote:
On 4/9/2014 11:53 PM, Toke Eskildsen wrote:
 The memory allocation for enum is both low and independent of the amount
 of unique values in the facets. The trade-off is that is is very slow
 for medium- to high-cardinality fields.

 This is where it is extremely beneficial to have enough RAM to cache
 your entire index.  The term list must be enumerated for every facet
 request, but if the data is already in the OS disk cache, this is very
 fast.

Very fast compared to not cached, yes, but still slow compared to fc, for 
high-cardinality. The processing overhead per term is a great deal larger for 
enum. I recently ran some tests with Solr's different faceting methods for 50M+ 
values, but stopped measuring for enum as it took so much longer than the other 
methods. For a fully cached index.

 If facets are happening on lots of fields and are heavily utilized,
 facet.method=enum should be used, and there must be plenty of RAM to
 cache all or most of the index data on the machine.

I do not understand how the number of facets has any influence on the choice 
between enum and fc. As Solr (sadly) does not support combined structures for 
multiple facets, each facet is independent from the others. Shouldn't the 
choice be done for each individual facet?

- Toke Eskildsen

RE: Were changes made to facetting on multivalued fields recently?

2014-04-10 Thread Jean-Sebastien Vachon

The SQL query contains a Replace statement that does this

 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: April-10-14 11:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Were changes made to facetting on multivalued fields recently?

 On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote:
  Here are the field definitions for both our old and new index... as you can
 see that are identical. We've been using this chain and field type starting 
 with
 Solr 1.4 and never had any problem. As for the documents, both indexes are
 using the same data source. They could be slightly out of sync from time to
 time but we tend to index them on a daily basis. Both indexes are also using
 the same code (indexing through SolrJ) to index their content.

  The source is a column in MySql that contains entries such as 4,1
  that get stored in a Multivalued fields after replacing commas by
  spaces

  OLD (4.6.1):
 fieldType name=text_ws class=solr.TextField
 positionIncrementGap=100
analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
  /fieldType

  field name=ad_job_type_id type=text_ws indexed=true
  stored=true required=false multiValued=true /

 Just so you know, there's nothing here that would require the field to be
 multivalued.  WhitespaceTokenizerFactory does not create multiple field
 values, it creates multiple terms.  If you are actually inserting multiple 
 values
 for the field in SolrJ, then you would need a multivalued field.

 What is replacing the commas with spaces?  I don't see anything here that
 would do that.  It sounds like that part of your indexing is not working.

 Thanks,
 Shawn

 -
 Aucun virus trouvé dans ce message.
 Analyse effectuée par AVG - www.avg.fr
 Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date:
 09/04/2014

Re: Facet search and growing memory usage

2014-04-10 Thread Mikhail Khludnev

fwiw,
Facets are much less heap greedy when counted for docValues enabled fields,
they should not hit UnInvertedField in this case. Try them.


On Thu, Apr 10, 2014 at 8:20 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 Shawn Heisey [s...@elyograg.org] wrote:
 On 4/9/2014 11:53 PM, Toke Eskildsen wrote:
  The memory allocation for enum is both low and independent of the amount
  of unique values in the facets. The trade-off is that is is very slow
  for medium- to high-cardinality fields.

  This is where it is extremely beneficial to have enough RAM to cache
  your entire index.  The term list must be enumerated for every facet
  request, but if the data is already in the OS disk cache, this is very
  fast.

 Very fast compared to not cached, yes, but still slow compared to fc, for
 high-cardinality. The processing overhead per term is a great deal larger
 for enum. I recently ran some tests with Solr's different faceting methods
 for 50M+ values, but stopped measuring for enum as it took so much longer
 than the other methods. For a fully cached index.

  If facets are happening on lots of fields and are heavily utilized,
  facet.method=enum should be used, and there must be plenty of RAM to
  cache all or most of the index data on the machine.

 I do not understand how the number of facets has any influence on the
 choice between enum and fc. As Solr (sadly) does not support combined
 structures for multiple facets, each facet is independent from the others.
 Shouldn't the choice be done for each individual facet?

 - Toke Eskildsen




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Another japanese analysis problem

My analysis chain includes CJKBigramFilter on both the index and query.  
I have outputUnigrams enabled on the index side, but it is disabled on 
the query side.  This has resulted in a problem with phrase queries.  
This is a subset of my index analysis for the three terms you can see in 
the ICUNF step, separated by spaces:


https://www.dropbox.com/s/9q1x9pdbsjhzocg/bigram-position-problem.png

Note that in the CJKBF step, the second unigram is output at position 2, 
pushing the english terms to 3 and 4.


When the customer phrase filter query (lucene query parser) for the 
first two terms on this specific field, it doesn't match, because the 
query analysis doesn't output the unigrams and therefore the positions 
don't match.


I would have expected both unigrams to be at position 1.  Is this a bug 
or expected behavior?


Thanks,
Shawn

Re: Were changes made to facetting on multivalued fields recently?

bq: The SQL query contains a Replace statement that does this

Well, I suspect that's where the issue is. The facet values being
reported include:
int name=4,1134826/int
which indicates that the incoming text to Solr still has the commas.
Solr is seeing the commas and all.

You can cure this by using PatternReplaceCharFilterFactory and doing
the substitution at index time if you want to.

That doesn't clarify why the behavior has changed though, but my
supposition is that it has nothing to do with Solr, and something
about your SQL statement is different.

Best,
Erick

On Thu, Apr 10, 2014 at 9:33 AM, Jean-Sebastien Vachon
jean-sebastien.vac...@wantedanalytics.com wrote:
 The SQL query contains a Replace statement that does this

 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org]
 Sent: April-10-14 11:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Were changes made to facetting on multivalued fields recently?

 On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote:
  Here are the field definitions for both our old and new index... as you can
 see that are identical. We've been using this chain and field type starting 
 with
 Solr 1.4 and never had any problem. As for the documents, both indexes are
 using the same data source. They could be slightly out of sync from time to
 time but we tend to index them on a daily basis. Both indexes are also using
 the same code (indexing through SolrJ) to index their content.
 
  The source is a column in MySql that contains entries such as 4,1
  that get stored in a Multivalued fields after replacing commas by
  spaces
 
  OLD (4.6.1):
 fieldType name=text_ws class=solr.TextField
 positionIncrementGap=100
analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
  /fieldType
 
  field name=ad_job_type_id type=text_ws indexed=true
  stored=true required=false multiValued=true /

 Just so you know, there's nothing here that would require the field to be
 multivalued.  WhitespaceTokenizerFactory does not create multiple field
 values, it creates multiple terms.  If you are actually inserting multiple 
 values
 for the field in SolrJ, then you would need a multivalued field.

 What is replacing the commas with spaces?  I don't see anything here that
 would do that.  It sounds like that part of your indexing is not working.

 Thanks,
 Shawn


 -
 Aucun virus trouvé dans ce message.
 Analyse effectuée par AVG - www.avg.fr
 Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date:
 09/04/2014

filter capabilities are limited?

2014-04-10 Thread horot

hi 
whether it is possible to compare two variables in the Solr filter? 
As best to build a filter: x - y = 0, i.e. get all records if x = y



--
View this message in context: 
http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: filter capabilities are limited?

2014-04-10 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Where are the values coming from? You might be able to use the _val_
hook for function queries if they're in fields in the doc. Or if it's
a constant you pass in

Let's claim it's just a value in you document. Can't you just form a
filter query on it?

Details matter, there's not enough info here to say anything definitive.

Best,
Erick

On Thu, Apr 10, 2014 at 10:56 AM, horot roman.she...@gmail.com wrote:
 hi
 whether it is possible to compare two variables in the Solr filter?
 As best to build a filter: x - y = 0, i.e. get all records if x = y



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Relevance/Rank


Hi  I am looking boosting to help if I can achieve the Rank equal to MS SQL 
Server.

I have a query something like 

fq=(SKU:123-87458) OR  Name: 123-87458

I need to get the Exact Match as first in the results, In this case SKU. But 
also I can change to display Name in the List which is not exact match but 
match , the value can be find some where in the Name?

In Simple I can Rank SKU as 1 and Name as 2 for some customer and some customer 
Rank Name as 1 and SKU as 2 in the results.

Is this Possible , I tried Boosting but that it seems it is for Text, correct 
me if I am wrong on understanding and any example will be really appreciated. I 
am getting confused after going thru different sites.


Thanks

Re: filter capabilities are limited?

2014-04-10 Thread horot

Values come from the Solr doc. I can not get to compare the two fields to
get some result. The logic of such a query: x  '' and y  '' and x = y.

it's something like

q=x:* AND y:* AND x:y

but the problem is that the field can not be compared direct form. If
someone knows how to solve this problem please write examples.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458p4130472.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Relevance/Rank

What kind of field is Name? Assuming it's string, you should be able
to boost it. Boosts are not relevant to filters (fq) clauses at all,
where were you trying to add the boost?

You need to provide significantly more information to get a more
helpful answer. You might review:

http://wiki.apache.org/solr/UsingMailingLists

bq:  I am getting confused after going thru different sites.

You're in luck, the Solr Reference Guide
(https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide)
is becoming the collected source of information. Also, here's a
sorely-needed set of up-to-date info:
http://www.manning.com/grainger/, and Jack Krupansky is publishing an
e-book here: http://www.lulu.com/spotlight/JackKrupansky (this is the
last link I have, there may be more recent copies).

Best,
Erick

On Thu, Apr 10, 2014 at 11:49 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
wrote:

 Hi  I am looking boosting to help if I can achieve the Rank equal to MS SQL 
 Server.

 I have a query something like

 fq=(SKU:123-87458) OR  Name: 123-87458

 I need to get the Exact Match as first in the results, In this case SKU. But 
 also I can change to display Name in the List which is not exact match but 
 match , the value can be find some where in the Name?

 In Simple I can Rank SKU as 1 and Name as 2 for some customer and some 
 customer Rank Name as 1 and SKU as 2 in the results.

 Is this Possible , I tried Boosting but that it seems it is for Text, correct 
 me if I am wrong on understanding and any example will be really appreciated. 
 I am getting confused after going thru different sites.


 Thanks

Re: Relevance/Rank

On 4/10/2014 12:49 PM, EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions) wrote:

Hi  I am looking boosting to help if I can achieve the Rank equal to MS SQL 
Server.

I have a query something like

fq=(SKU:123-87458) OR  Name: 123-87458

I need to get the Exact Match as first in the results, In this case SKU. But 
also I can change to display Name in the List which is not exact match but 
match , the value can be find some where in the Name?

In Simple I can Rank SKU as 1 and Name as 2 for some customer and some customer 
Rank Name as 1 and SKU as 2 in the results.

Is this Possible , I tried Boosting but that it seems it is for Text, correct 
me if I am wrong on understanding and any example will be really appreciated. I 
am getting confused after going thru different sites.


Your query is being sent with the fq parameter.  Filter queries do not 
affect scoring at all.  They are purely for filtering.  You would need 
to move this to the q parameter (query) in order for what's there to 
affect relevancy ranking.


You will very likely want to look over this:

https://wiki.apache.org/solr/SolrRelevancyFAQ

Thanks,
Shawn

Re: filter capabilities are limited?

2014-04-10 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Uhhhm, did you look at function queries at all? That should work for you.

You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Thu, Apr 10, 2014 at 11:51 AM, horot roman.she...@gmail.com wrote:
 Values come from the Solr doc. I can not get to compare the two fields to
 get some result. The logic of such a query: x  '' and y  '' and x = y.

 it's something like

 q=x:* AND y:* AND x:y

 but the problem is that the field can not be compared direct form. If
 someone knows how to solve this problem please write examples.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458p4130472.html
 Sent from the Solr - User mailing list archive at Nabble.com.

RE: Relevance/Rank

Eric, Below is the query part

select?q=*:*fq={!join%20from=SKU%20to=SKU%20fromIndex=Collection2}(CatalogName:*Products)fq=(SKU:204-161)%20OR%20(Name:%22204-161%22)bq=Name:%22204-161%22^2
I am not getting the Name Match record in the first list , I am getting always 
SKU matching Record.

Any help is really appreciated.

Thanks

Ravi

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, April 10, 2014 3:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Relevance/Rank

What kind of field is Name? Assuming it's string, you should be able to boost 
it. Boosts are not relevant to filters (fq) clauses at all, where were you 
trying to add the boost?

You need to provide significantly more information to get a more helpful 
answer. You might review:

http://wiki.apache.org/solr/UsingMailingLists

bq:  I am getting confused after going thru different sites.

You're in luck, the Solr Reference Guide
(https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide)
is becoming the collected source of information. Also, here's a sorely-needed 
set of up-to-date info:
http://www.manning.com/grainger/, and Jack Krupansky is publishing an e-book 
here: http://www.lulu.com/spotlight/JackKrupansky (this is the last link I 
have, there may be more recent copies).

Best,
Erick

On Thu, Apr 10, 2014 at 11:49 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
wrote:

 Hi  I am looking boosting to help if I can achieve the Rank equal to MS SQL 
 Server.

 I have a query something like

 fq=(SKU:123-87458) OR  Name: 123-87458

 I need to get the Exact Match as first in the results, In this case SKU. But 
 also I can change to display Name in the List which is not exact match but 
 match , the value can be find some where in the Name?

 In Simple I can Rank SKU as 1 and Name as 2 for some customer and some 
 customer Rank Name as 1 and SKU as 2 in the results.

 Is this Possible , I tried Boosting but that it seems it is for Text, correct 
 me if I am wrong on understanding and any example will be really appreciated. 
 I am getting confused after going thru different sites.


 Thanks

Re: Range query and join, oarse exception when parens are added

2014-04-10 Thread Mark Olsen

On 4/8/2014 22:00 GMT Shawn Heisey wrote:
On 4/8/2014 1:48 PM, Mark Olsen wrote:
 Solr version 4.2.1

 I'm having an issue using a join query with a range query, but only when 
 the query
is wrapped in parens.

 This query works:

 {!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 
 TO 50]

 However this query does not (just wrapping with parens):

 ({!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 
 TO 50])

 The {!join...} part of that is a localParam.  It must precede the entire 
 query.  If you want to add parens, here's how you would need to do it:
 
 {!join from=member_profile_doc_id to=id}(language_proficiency_id_number:[30 
 TO 50])

 With the left parenthesis where you placed it, the localParam is 
 considered part of the query itself.  It becomes incorrect Solr syntax 
 at that point.
 
 Thanks,
 Shawn

Shawn,

Thank you for the response and I apologize for the delayed reply.

If I do the query with the localParam as you show it works, however as my 
queries become more complex with multiple terms and fields I'm finding it 
difficult to get the join localParam to work.

For example, when I was first playing with the above range query, I was able to 
get this to work with nested localParams within the parenthesis:

+({!join from=member_profile_doc_id to=id}language_noun:english
 ({!join from=member_profile_doc_id to=id}language_proficiency_id_number:30
  {!join from=member_profile_doc_id to=id}language_proficiency_id_number:40 
  {!join from=member_profile_doc_id to=id}language_proficiency_id_number:50))

This would work, but as soon as I used the range query with the square brackets 
it would fail, hence why I thought was an issue.

As I've been trying queries it seemed to me that I need to have the localParam 
before each field:term pair to get results.

Another example, this first query yields the results that I am expecting. Note 
the parenthesis wrapping each localParam and field:term pair. In this case 
there are two documents that both have the same member_profile_doc_id that 
are matching the two two queries (language and certification).

+({!join from=member_profile_doc_id to=id}language_noun:english)
+({!join from=member_profile_doc_id to=id}certification_authority_id_number:50)

If I move the localParams outside and no longer declare them with each 
field:term pair then no results are returned, example query:

{!join from=member_profile_doc_id to=id}(+language_noun:english 
+certification_authority_id_number:50)

Unfortunately the documentation for the join localParams only gives a simple 
field:term example so I've had to experiement with more complex queries to 
figure out how to get it to work. 

`Mark

Solr Admin core status - Index is not Current

2014-04-10 Thread Chris W

Hi there

  I am using solrcloud (4.3). I am trying to get the status of a core from
solr using (localhost:8000/solr/admin/cores?action=STATUScore=core) and
i get the following output

int name=numDocs100/int
int name=maxDoc102/int
int name=deletedDocs2/int
long name=version20527/long
int name=segmentCount20/int
*bool name=currentfalse/bool*

What does current mean? A few of the cores are optimized (with segment
count 1) and show current = true and rest show current as false.

If i have to make the core as current, what should i do? Is it a big alarm
if the value is false?

-- 
Best
-- 
C

Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-10 Thread Wolfgang Hoschek

There’s no such other location in there. BTW, you can disable the mtree merge
via --reducers=-2 (or --reducers=0 in old versions) .

Wolfgang.

On Apr 10, 2014, at 3:44 PM, Dmitry Kan solrexp...@gmail.com wrote:

Wolfgang, do you happen to know where that other Version.* is specified?

On Thu, Apr 10, 2014 at 12:59 PM, Dmitry Kan solrexp...@gmail.com wrote:

Thanks for responding, Wolfgang.

Changing to LUCENE_43:

IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_43,
null);

Dmitry

On Wed, Apr 9, 2014 at 11:51 PM, Wolfgang Hoschek
whosc...@cloudera.comwrote:

There is a current limitation in that the code doesn't actually look into
solrconfig.xml for the version. We should fix this, indeed. See

https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/TreeMergeOutputFormat.java#L100-101

Wolfgang.

On Apr 8, 2014, at 11:49 AM, Dmitry Kan solrexp...@gmail.com wrote:

Hello,

This does not seem to happen, however.

Checking with luke:

the expected Lucene index format: Lucene 4.0
the output Lucene index format: Lucene 4.1

Can anybody shed some light onto the semantics behind specifying the
Lucene
version in this context? Does this have something to do with what
version
of solr core is used by the morphline library?

Thanks,

Dmitry

-- Forwarded message --

Dear list,

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: filter capabilities are limited?

2014-04-10 Thread Joel Bernstein

It sounds like you can make it work with the frange qparser plugin:

fq={!frange l=0 u=0}sub(field(a),field(b))



Joel Bernstein
Search Engineer at Heliosearch


On Thu, Apr 10, 2014 at 3:36 PM, Erick Erickson erickerick...@gmail.comwrote:

 Uhhhm, did you look at function queries at all? That should work for you.

 You might want to review:
 http://wiki.apache.org/solr/UsingMailingLists

 Best,
 Erick

 On Thu, Apr 10, 2014 at 11:51 AM, horot roman.she...@gmail.com wrote:
  Values come from the Solr doc. I can not get to compare the two fields to
  get some result. The logic of such a query: x  '' and y  '' and x =
 y.
 
  it's something like
 
  q=x:* AND y:* AND x:y
 
  but the problem is that the field can not be compared direct form. If
  someone knows how to solve this problem please write examples.
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/filter-capabilities-are-limited-tp4130458p4130472.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: Nested documents, block join - re-indexing a single document upon update

2014-04-10 Thread Mikhail Khludnev

On Sun, Mar 16, 2014 at 2:47 PM, danny teichthal dannyt...@gmail.comwrote:


  To make things short, I would like to use block joins, but to be able to
 index each document on the block separately.

 Is it possible?


no way. use query time {!join} or denormalize then, field collapsing.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

Collections Design

2014-04-10 Thread Rallavagu


All,

What is the best practice or guideline towards considering multiple 
collections particularly in the solr cloud env?


Thanks

Srikanth

Re: Range query and join, oarse exception when parens are added

2014-04-10 Thread Chris Hostetter


Mark: first off, the details matter.

Nothing in your first email mae it clear that the {!join} query you were 
refering to was not the entirety of your query param -- which is part of 
the confusion and was a significant piece of Shawn's answer.  Had you 
posted the *exact* request you were sending (with all params) and the 
full response you got, the root cause of your problem would have been a 
lot more obvious to some folks very quickly.

As shawn mentioned, the {!...} syntax involves localparams and invoking a 
nammed parser -- normaly this syntax is the *first* thig in a query 
string, and causes the *entire* string to be parsed by that parser.  this 
is why something like this should work fine for you (please confirm if 
it does not)...

  q={!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 
TO 50]

...because the {! at the begining of the param value tells solr to let 
that parser (in this case join) parse the entire string, using the 
localparam options specified (from=member_profile_doc_id  to=id).  the 
join parser then delegates to the lucene parser for it's body 
(language_proficiency_id_number:[30 TO 50])

However ... when the {! syntax is not the first thing in the param 
value, what happens is that the default parser (lucene) is used -- the 
lucene parser has a handy feature that supports looking for the {! 
syntax nested inside of it ... which means you can do things like this...


  q=+(+{!prefix f=foo}bar -{!term f=yak}wak) -aaa:zzz

...however there is an important caveat to this: when the lucene parser 
is looking for the {! syntax to know when you want it to delegate to 
another parser, how does it know if/when you intended for the input to 
that nested parser to end?

Specifically, in the example above, how does it know if the input to the 
prefix parser was ment to be bar or bar -{!term f=yak}wak) -aaa:zzz

The answer is that it's conservative and assumes the input to the nested 
parser stops as soon as it sees something that looks like the end of a the 
current clause: whitespace or an open/close parent for example.

Which brings us to your specific example...

: +({!join from=member_profile_doc_id to=id}language_noun:english
:  ({!join from=member_profile_doc_id to=id}language_proficiency_id_number:30
:   {!join from=member_profile_doc_id to=id}language_proficiency_id_number:40 
:   {!join from=member_profile_doc_id to=id}language_proficiency_id_number:50))

...in this case, when the lucene parser sees the nested {!join...} parsers 
it has no problem, because it's conservative rulees about the end of the 
clauses match up with what you expect given the simple term queries.  if 
you change those individual term queries to a range query however...

+({!join from=member_profile_doc_id to=id}language_noun:english
  {!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 TO 
50])

...in this case, the lucene parser sees the {!join} syntax and delegates 
to the join parser, but it assumes only the 
language_proficiency_id_number:[30 portion of the input is ment for that 
parser, and hangs on to the TO 50] to parse ass additional clauses.  the 
join parse doesn't really care about the input, but when it delegates to 
another nested instance of the lucene parser, the input 
language_proficiency_id_number:[30 isn't valid because it's the start of 
an unfinished range query.

Does that make sense so far?

As for the solution:

when you use the {!foo} syntax, the local param v can be used to 
specify the main input to the nested parser instead of the usual 
prefix-ish syntax -- and this scopes the input un-ambiguiously...

+({!join from=member_profile_doc_id to=id v='language_noun:english'}
  {!join from=member_profile_doc_id to=id v='language_proficiency_id_number:[30 
TO 50]'})

FWIW, you can also use param derefrencing it it helps make things easier 
to read for you (and/or if you need to include nested quotes and don't 
want to deal with the escaping)...

q=+({!join from=$from to=id v=$noun} {!join from=$from to=id v=$prof)}
from=member_profile_doc_id
$noun=language_noun:english
$prof=language_proficiency_id_str:[thirty three TO fifty]'




-Hoss
http://www.lucidworks.com/

Re: Range query and join, oarse exception when parens are added

2014-04-10 Thread Mark Olsen

Chris,

Thank you for the detailed explanation, this helps a lot.

One of my current hurdles is my search system is in Java using Lucene Query
objects to construct a BooleanQuery which is then handed to Solr. Since Lucene
does not know about the LocalParams it's tricky to get them to play properly
when dealing with complex queries. My first solution was to prefix the 
LocalParams to the field name, this worked fine until I ran the range query.

Changing to use the v= field of a LocalParam would work from the query
structure perspective, however getting that into a Lucene Query object will be
a fun exercise.

`Mark


- Original Message -
From: Chris Hostetter hossman_luc...@fucit.org
To: solr-user@lucene.apache.org
Sent: Thursday, April 10, 2014 4:33:19 PM
Subject: Re: Range query and join, oarse exception when parens are added


Mark: first off, the details matter.

Nothing in your first email mae it clear that the {!join} query you were 
refering to was not the entirety of your query param -- which is part of 
the confusion and was a significant piece of Shawn's answer.  Had you 
posted the *exact* request you were sending (with all params) and the 
full response you got, the root cause of your problem would have been a 
lot more obvious to some folks very quickly.

As shawn mentioned, the {!...} syntax involves localparams and invoking a 
nammed parser -- normaly this syntax is the *first* thig in a query 
string, and causes the *entire* string to be parsed by that parser.  this 
is why something like this should work fine for you (please confirm if 
it does not)...

  q={!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 
TO 50]

...because the {! at the begining of the param value tells solr to let 
that parser (in this case join) parse the entire string, using the 
localparam options specified (from=member_profile_doc_id  to=id).  the 
join parser then delegates to the lucene parser for it's body 
(language_proficiency_id_number:[30 TO 50])

However ... when the {! syntax is not the first thing in the param 
value, what happens is that the default parser (lucene) is used -- the 
lucene parser has a handy feature that supports looking for the {! 
syntax nested inside of it ... which means you can do things like this...


  q=+(+{!prefix f=foo}bar -{!term f=yak}wak) -aaa:zzz

...however there is an important caveat to this: when the lucene parser 
is looking for the {! syntax to know when you want it to delegate to 
another parser, how does it know if/when you intended for the input to 
that nested parser to end?

Specifically, in the example above, how does it know if the input to the 
prefix parser was ment to be bar or bar -{!term f=yak}wak) -aaa:zzz

The answer is that it's conservative and assumes the input to the nested 
parser stops as soon as it sees something that looks like the end of a the 
current clause: whitespace or an open/close parent for example.

Which brings us to your specific example...

: +({!join from=member_profile_doc_id to=id}language_noun:english
:  ({!join from=member_profile_doc_id to=id}language_proficiency_id_number:30
:   {!join from=member_profile_doc_id to=id}language_proficiency_id_number:40 
:   {!join from=member_profile_doc_id to=id}language_proficiency_id_number:50))

...in this case, when the lucene parser sees the nested {!join...} parsers 
it has no problem, because it's conservative rulees about the end of the 
clauses match up with what you expect given the simple term queries.  if 
you change those individual term queries to a range query however...

+({!join from=member_profile_doc_id to=id}language_noun:english
  {!join from=member_profile_doc_id to=id}language_proficiency_id_number:[30 TO 
50])

...in this case, the lucene parser sees the {!join} syntax and delegates 
to the join parser, but it assumes only the 
language_proficiency_id_number:[30 portion of the input is ment for that 
parser, and hangs on to the TO 50] to parse ass additional clauses.  the 
join parse doesn't really care about the input, but when it delegates to 
another nested instance of the lucene parser, the input 
language_proficiency_id_number:[30 isn't valid because it's the start of 
an unfinished range query.

Does that make sense so far?

As for the solution:

when you use the {!foo} syntax, the local param v can be used to 
specify the main input to the nested parser instead of the usual 
prefix-ish syntax -- and this scopes the input un-ambiguiously...

+({!join from=member_profile_doc_id to=id v='language_noun:english'}
  {!join from=member_profile_doc_id to=id v='language_proficiency_id_number:[30 
TO 50]'})

FWIW, you can also use param derefrencing it it helps make things easier 
to read for you (and/or if you need to include nested quotes and don't 
want to deal with the escaping)...

q=+({!join from=$from to=id v=$noun} {!join from=$from to=id v=$prof)}
from=member_profile_doc_id
$noun=language_noun:english

Re: best way to contribute solr??

any help related to my previous mail update??


On Thu, Apr 10, 2014 at 7:52 PM, Aman Tandon amantandon...@gmail.comwrote:

 thanks sir, i always smile when people here are always ready for help, i
 am thankful to all, and yes i started learning by reading daily at least
 50-60 mails to increase my knowledge gave my suggestion if i am familiar
 with it, people here correct me as well if i am wrong. I know it will take
 time but someday i will contribute as well and thanks for setup it will be
 quite helpful. In my office i am using solr 4.2 with tomcat right now i am
 stucked because i don't know how to integrate solr 4.7 with my tomcat,
 because the problem for me is that i am familiar with the cores
 architecture of solr 4.2 in which we defined the every core name as well as
 instanceDir but not with solr 4.7.

 Thanks
 Aman Tandon


 On Thu, Apr 10, 2014 at 7:31 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Aman:

 Here's another helpful resource:

 http://wiki.apache.org/solr/HowToContribute

 It tells you how to get the source code, set up an IDE etc. for
 Solr/Lucene

 In addition to Alexandre's suggestions, one possibility (but I warn
 you it can be challenging) is to create unit tests. Part of the build
 report each night has a coverage, you can get to the latest build
 here:

 https://wiki.apache.org/solr/NightlyBuilds

 click on clover test coverage and pick something, track down what
 isn't covered (see the clover report link for instance).

 Warning: You will be completely lost for a while. This is hard stuff
 when you're just starting out especially. So choose the simplest thing
 you can for the first go to get familiar with the process if you want
 to try this.

 Another place to start is...the user's list. Pick one question a day,
 research it and try to provide an answer. Clearly label your responses
 with the degree of certainty you have. Another caution: you'll
 research something and get back to the list to discover its already
 been answered sometimes but you'll have gained the knowledge and
 it gets better over time.

 Best,
 Erick

 On Thu, Apr 10, 2014 at 12:03 AM, Aman Tandon amantandon...@gmail.com
 wrote:
  Thanks sir, I will look into this.
  Solr and its developer are all helpful and awesome, i am feeling great.
 
  Thanks
  Aman Tandon
 
 
  On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
  Sure, you can do it in Java too. The difference is that Solr comes
  with Java client SolrJ which is tested and kept up-to-date. But there
  could still be more tutorials.
 
  For other languages/clients, there is a lot less information
  available.  Especially, if you start adding (human) languages into it.
  E.g. how to process your own language (if non-English).
 
  And there are many more ideas on Slide 26 of
 
 http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup
  . As well as an example of processing pipeline for Thai. More of these
  kinds of things would be useful too.
 
  Regards,
 Alex.
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com
  wrote:
   Thank you so much sir :)
  
   Can i try in java as well?
  
   Thanks
   Aman Tandon
  
  
   On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch
   arafa...@gmail.comwrote:
  
   Great, Solr + Perl + Geospatial.
  
   There are two Perl clients for Solr listed on the Wiki:
   http://wiki.apache.org/solr/IntegratingSolr . Are there any more?
 If
   yes, add them  to the Wiki (need to ask permission to edit Wiki).
  
   Are those two listed clients dead or alive? Do they work with Solr
   4.7.1? Can you make them work with Solr 4.7.1 and recent version of
   Perl? Can you do a small demo that uses Perl client to index some
   geospatial information and then do a search for it?
  
   I strongly suspect you will hit some interesting issues. Find the
 fix,
   contribute back to the Perl library maintainer. Or, at least,
 clearly
   describe the issue, if you don't yet know enough to contribute the
   fix.
  
   Regards,
  Alex.
  
  
   Personal website: http://www.outerthoughts.com/
   Current project: http://www.solr-start.com/ - Accelerating your
 Solr
   proficiency
  
  
   On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon 
 amantandon...@gmail.com
   wrote:
Okay sir i will mail to solr-user only, I am feeling so thankful
 to
  you
   for
all you help, i am java developer with a good knowledge of perl,
  working
   on
solr, actually just started working on solr for the geospatial
  search(not
using JTS) only, To be very frank I learned about faceting from Mr
   Yonik's
tutorial, geospatial(not JTS), indexing ,searching and boosting.
 Thats
   all.
What is your suggestion now and yesterday i suscribed for
 solr-start
  as
well. And sir what do you mean by *Create a basic

Re: Relevance/Rank

What Shawn said. q=*:* is a constant score query, i.e. every match
has a score as 1.0.
fq clauses don't contribute to the score. The boosts you're specifying
have absolutely no effect.
Move the fq clause to your main query (q=) to see any effect.
Try adding debug=all to your query and look at the explanation of how
the score is calculated, I suspect you'll find them all 1.0.

Best,
Erick

On Thu, Apr 10, 2014 at 12:53 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
wrote:
Eric, Below is the query part

select?q=*:*fq={!join%20from=SKU%20to=SKU%20fromIndex=Collection2}(CatalogName:*Products)fq=(SKU:204-161)%20OR%20(Name:%22204-161%22)bq=Name:%22204-161%22^2
I am not getting the Name Match record in the first list , I am getting
always SKU matching Record.

Any help is really appreciated.

Thanks

Ravi

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Thursday, April 10, 2014 3:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Relevance/Rank

What kind of field is Name? Assuming it's string, you should be able to
boost it. Boosts are not relevant to filters (fq) clauses at all, where were
you trying to add the boost?

You need to provide significantly more information to get a more helpful
answer. You might review:

http://wiki.apache.org/solr/UsingMailingLists

bq: I am getting confused after going thru different sites.

You're in luck, the Solr Reference Guide
(https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide)
is becoming the collected source of information. Also, here's a sorely-needed
set of up-to-date info:
http://www.manning.com/grainger/, and Jack Krupansky is publishing an e-book
here: http://www.lulu.com/spotlight/JackKrupansky (this is the last link I
have, there may be more recent copies).

Best,
Erick

On Thu, Apr 10, 2014 at 11:49 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
wrote:

Hi I am looking boosting to help if I can achieve the Rank equal to MS SQL
Server.

I have a query something like

fq=(SKU:123-87458) OR Name: 123-87458

I need to get the Exact Match as first in the results, In this case SKU. But
also I can change to display Name in the List which is not exact match but
match , the value can be find some where in the Name?

In Simple I can Rank SKU as 1 and Name as 2 for some customer and some
customer Rank Name as 1 and SKU as 2 in the results.

Is this Possible , I tried Boosting but that it seems it is for Text,
correct me if I am wrong on understanding and any example will be really
appreciated. I am getting confused after going thru different sites.

Thanks

Re: Relevance/Rank

Hi Ravi,

For the better analysis for ranking of documents, you should need to query
the index with these extra parameters(in bold)
eg...whole_query*debug=truewt=xml.*
Copy that xml and and paste it to http://explain.solr.pl/ you can then
easily find out the ranking alalysis in the forms of the pie charts and how
much weight is giving to every parameters in your solr config and in the
query.

On Fri, Apr 11, 2014 at 5:56 AM, Erick Erickson erickerick...@gmail.comwrote:

Best,
Erick

On Thu, Apr 10, 2014 at 12:53 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
wrote:
Eric, Below is the query part

Any help is really appreciated.

Thanks

Ravi

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Thursday, April 10, 2014 3:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Relevance/Rank

What kind of field is Name? Assuming it's string, you should be able
to boost it. Boosts are not relevant to filters (fq) clauses at all, where
were you trying to add the boost?

You need to provide significantly more information to get a more helpful
answer. You might review:

http://wiki.apache.org/solr/UsingMailingLists

bq: I am getting confused after going thru different sites.

You're in luck, the Solr Reference Guide
(
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
)
is becoming the collected source of information. Also, here's a
sorely-needed set of up-to-date info:
http://www.manning.com/grainger/, and Jack Krupansky is publishing an
e-book here: http://www.lulu.com/spotlight/JackKrupansky (this is the
last link I have, there may be more recent copies).

Best,
Erick

On Thu, Apr 10, 2014 at 11:49 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
wrote:

Hi I am looking boosting to help if I can achieve the Rank equal to MS
SQL Server.

I have a query something like

fq=(SKU:123-87458) OR Name: 123-87458

I need to get the Exact Match as first in the results, In this case
SKU. But also I can change to display Name in the List which is not exact
match but match , the value can be find some where in the Name?

In Simple I can Rank SKU as 1 and Name as 2 for some customer and some
customer Rank Name as 1 and SKU as 2 in the results.

Thanks

--
With Regards
Aman Tandon

Re: Relevance/Rank

Hello Erick,

I am confused here, how does the boosting will not affect if he is boosting
the name products by 2 because he is filtering the results and then
applying the boost.

On Fri, Apr 11, 2014 at 6:12 AM, Aman Tandon amantandon...@gmail.comwrote:

Hi Ravi,

On Fri, Apr 11, 2014 at 5:56 AM, Erick Erickson
erickerick...@gmail.comwrote:

Best,
Erick

On Thu, Apr 10, 2014 at 12:53 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
wrote:
Eric, Below is the query part

Any help is really appreciated.

Thanks

Ravi

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Thursday, April 10, 2014 3:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Relevance/Rank

What kind of field is Name? Assuming it's string, you should be able
to boost it. Boosts are not relevant to filters (fq) clauses at all, where
were you trying to add the boost?

You need to provide significantly more information to get a more
helpful answer. You might review:

http://wiki.apache.org/solr/UsingMailingLists

bq: I am getting confused after going thru different sites.

You're in luck, the Solr Reference Guide
(
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
)
is becoming the collected source of information. Also, here's a
sorely-needed set of up-to-date info:
http://www.manning.com/grainger/, and Jack Krupansky is publishing an
e-book here: http://www.lulu.com/spotlight/JackKrupansky (this is the
last link I have, there may be more recent copies).

Best,
Erick

On Thu, Apr 10, 2014 at 11:49 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
wrote:

Hi I am looking boosting to help if I can achieve the Rank equal to
MS SQL Server.

I have a query something like

fq=(SKU:123-87458) OR Name: 123-87458

I need to get the Exact Match as first in the results, In this case
SKU. But also I can change to display Name in the List which is not exact
match but match , the value can be find some where in the Name?

In Simple I can Rank SKU as 1 and Name as 2 for some customer and some
customer Rank Name as 1 and SKU as 2 in the results.

Thanks

--
With Regards
Aman Tandon

svn vs GIT

Hi,

I am new here, i have question in mind that why we are preferring the svn
more than git?

-- 
With Regards
Aman Tandon

Re: multiple analyzers for one field

2014-04-10 Thread Michael Sokolov

The lack of response to this question makes me think that either there 
is no good answer, or maybe the question was too obtuse.  So I'll give 
it one more go with some more detail ...


My main goal is to implement autocompletion with a mix of words and 
short phrases, where the words are drawn from the text of largish 
documents, and the phrases are author names and document titles.


I think the best way to accomplish this is to concoct a single field 
that contains data from these other source fields (as usual with 
copyField), but with some of the fields treated as keywords (ie with 
their values inserted as single tokens), and others tokenized.  I 
believe this would be possible at the Lucene level by calling 
Document.addField () with multiple fields having the same name: some 
marked as TOKENIZED and others not.  I think the tokenized fields would 
have to share the same analyzer, but that's OK for my case.


I can't see how this could be made to happen in Solr without a lot of 
custom coding though. It seems as if the conversion from Solr fields to 
Lucene fields is not an easy thing to influence.  If anyone has an idea 
how to achieve the subgoal, or perhaps a different way of getting at the 
main goal, I'd love to hear about it.


So far my only other idea is to write some kind of custom analyzer that 
treats short texts as keywords and tokenizes longer ones, which is 
probably what I'll look at if nothing else comes up.


Thanks

Mike


On 4/9/2014 4:16 PM, Michael Sokolov wrote:
I think I would like to do something like copyfield from a bunch of 
fields into a single field, but with different analysis for each 
source, and I'm pretty sure that's not a thing. Is there some 
alternate way to accomplish my goal?


Which is to have a suggester that suggests words from my full text 
field and complete phrases drawn from my author and title fields all 
at the same time.  So If I could index author and title using 
KeyWordAnalyzer, and full text tokenized, that would be the bees knees.


-Mike

deleting large amount data from solr cloud

2014-04-10 Thread Vinay Pothnis

[solr version 4.3.1]

Hello,

I have a solr cloud (4 nodes - 2 shards) with a fairly large amount
documents (~360G of index per shard). Now, a major portion of the data is
not required and I need to delete those documents. I would need to delete
around 75% of the data.

One of the solutions could be to drop the index completely re-index. But
this is not an option at the moment.

When we tried to delete the data through a query - say 1 day/month's worth
of data. But after deleting just 1 month's worth of data, the master node
is going out of memory - heap space.

Wondering is there any way to incrementally delete the data without
affecting the cluster adversely.

Thank!
Vinay

Re: best way to contribute solr??

Put separate issues into separate emails. That way new people will
look at the new thread. As it was, it was out of the conversation flow
and got lost.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 11, 2014 at 7:16 AM, Aman Tandon amantandon...@gmail.com wrote:
 any help related to my previous mail update??


 On Thu, Apr 10, 2014 at 7:52 PM, Aman Tandon amantandon...@gmail.comwrote:

 thanks sir, i always smile when people here are always ready for help, i
 am thankful to all, and yes i started learning by reading daily at least
 50-60 mails to increase my knowledge gave my suggestion if i am familiar
 with it, people here correct me as well if i am wrong. I know it will take
 time but someday i will contribute as well and thanks for setup it will be
 quite helpful. In my office i am using solr 4.2 with tomcat right now i am
 stucked because i don't know how to integrate solr 4.7 with my tomcat,
 because the problem for me is that i am familiar with the cores
 architecture of solr 4.2 in which we defined the every core name as well as
 instanceDir but not with solr 4.7.

 Thanks
 Aman Tandon


 On Thu, Apr 10, 2014 at 7:31 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Aman:

 Here's another helpful resource:

 http://wiki.apache.org/solr/HowToContribute

 It tells you how to get the source code, set up an IDE etc. for
 Solr/Lucene

 In addition to Alexandre's suggestions, one possibility (but I warn
 you it can be challenging) is to create unit tests. Part of the build
 report each night has a coverage, you can get to the latest build
 here:

 https://wiki.apache.org/solr/NightlyBuilds

 click on clover test coverage and pick something, track down what
 isn't covered (see the clover report link for instance).

 Warning: You will be completely lost for a while. This is hard stuff
 when you're just starting out especially. So choose the simplest thing
 you can for the first go to get familiar with the process if you want
 to try this.

 Another place to start is...the user's list. Pick one question a day,
 research it and try to provide an answer. Clearly label your responses
 with the degree of certainty you have. Another caution: you'll
 research something and get back to the list to discover its already
 been answered sometimes but you'll have gained the knowledge and
 it gets better over time.

 Best,
 Erick

 On Thu, Apr 10, 2014 at 12:03 AM, Aman Tandon amantandon...@gmail.com
 wrote:
  Thanks sir, I will look into this.
  Solr and its developer are all helpful and awesome, i am feeling great.
 
  Thanks
  Aman Tandon
 
 
  On Thu, Apr 10, 2014 at 12:29 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
  Sure, you can do it in Java too. The difference is that Solr comes
  with Java client SolrJ which is tested and kept up-to-date. But there
  could still be more tutorials.
 
  For other languages/clients, there is a lot less information
  available.  Especially, if you start adding (human) languages into it.
  E.g. how to process your own language (if non-English).
 
  And there are many more ideas on Slide 26 of
 
 http://www.slideshare.net/arafalov/introduction-to-solr-from-bangkok-meetup
  . As well as an example of processing pipeline for Thai. More of these
  kinds of things would be useful too.
 
  Regards,
 Alex.
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Thu, Apr 10, 2014 at 1:52 PM, Aman Tandon amantandon...@gmail.com
  wrote:
   Thank you so much sir :)
  
   Can i try in java as well?
  
   Thanks
   Aman Tandon
  
  
   On Thu, Apr 10, 2014 at 12:15 PM, Alexandre Rafalovitch
   arafa...@gmail.comwrote:
  
   Great, Solr + Perl + Geospatial.
  
   There are two Perl clients for Solr listed on the Wiki:
   http://wiki.apache.org/solr/IntegratingSolr . Are there any more?
 If
   yes, add them  to the Wiki (need to ask permission to edit Wiki).
  
   Are those two listed clients dead or alive? Do they work with Solr
   4.7.1? Can you make them work with Solr 4.7.1 and recent version of
   Perl? Can you do a small demo that uses Perl client to index some
   geospatial information and then do a search for it?
  
   I strongly suspect you will hit some interesting issues. Find the
 fix,
   contribute back to the Perl library maintainer. Or, at least,
 clearly
   describe the issue, if you don't yet know enough to contribute the
   fix.
  
   Regards,
  Alex.
  
  
   Personal website: http://www.outerthoughts.com/
   Current project: http://www.solr-start.com/ - Accelerating your
 Solr
   proficiency
  
  
   On Thu, Apr 10, 2014 at 1:04 PM, Aman Tandon 
 amantandon...@gmail.com
   wrote:
Okay sir i will mail to solr-user only, I am feeling so thankful
 to
  you
   for
all you help, i am java developer with a good knowledge of perl,
  working
   on

Re: svn vs GIT

You can find the read-only Git's version of Lucene+Solr source code
here: https://github.com/apache/lucene-solr . The SVN preference is
Apache Foundation's choice and legacy. Most of the developers'
workflows are also around SVN.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 11, 2014 at 7:48 AM, Aman Tandon amantandon...@gmail.com wrote:
 Hi,

 I am new here, i have question in mind that why we are preferring the svn
 more than git?

 --
 With Regards
 Aman Tandon

Re: multiple analyzers for one field

It's an interesting question.

To start from, the copyField copies the source content, so there is no
source-related tokenization description. Only the target's one. So,
that approach is not suitable.

Regarding the lookups/auto-complete. There has been a bunch of various
implementations added recently, but they are not really documented.
Things like BlendedInfixSuggester are a bit hard to discover at the
moment. So, there might be something there if one digs a lot.

The other option is to do the tokenization in the
UpdateRequestProcessor chain. You could clone a field, and do some
processing so that by the time the content hits solr, it's already
pre-tokenized into multi-value field. Then, you could have
KeywordTokenizer on your collector field and separate URPs sub-chains
for each original fields that go into that. One related hack would be
to create a subclass of FieldMutatingUpdateProcessorFactory that wraps
an arbitrary tokenizer and splits out tokens as multi-value output.

This is a bit hazy, even in my own mind, but hopefully gives you
something new to think about.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 11, 2014 at 8:05 AM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
 The lack of response to this question makes me think that either there is no
 good answer, or maybe the question was too obtuse.  So I'll give it one more
 go with some more detail ...

 My main goal is to implement autocompletion with a mix of words and short
 phrases, where the words are drawn from the text of largish documents, and
 the phrases are author names and document titles.

 I think the best way to accomplish this is to concoct a single field that
 contains data from these other source fields (as usual with copyField),
 but with some of the fields treated as keywords (ie with their values
 inserted as single tokens), and others tokenized.  I believe this would be
 possible at the Lucene level by calling Document.addField () with multiple
 fields having the same name: some marked as TOKENIZED and others not.  I
 think the tokenized fields would have to share the same analyzer, but that's
 OK for my case.

 I can't see how this could be made to happen in Solr without a lot of custom
 coding though. It seems as if the conversion from Solr fields to Lucene
 fields is not an easy thing to influence.  If anyone has an idea how to
 achieve the subgoal, or perhaps a different way of getting at the main goal,
 I'd love to hear about it.

 So far my only other idea is to write some kind of custom analyzer that
 treats short texts as keywords and tokenizes longer ones, which is probably
 what I'll look at if nothing else comes up.

 Thanks

 Mike



 On 4/9/2014 4:16 PM, Michael Sokolov wrote:

 I think I would like to do something like copyfield from a bunch of fields
 into a single field, but with different analysis for each source, and I'm
 pretty sure that's not a thing. Is there some alternate way to accomplish my
 goal?

 Which is to have a suggester that suggests words from my full text field
 and complete phrases drawn from my author and title fields all at the same
 time.  So If I could index author and title using KeyWordAnalyzer, and full
 text tokenized, that would be the bees knees.

 -Mike

Re: svn vs GIT

thanks sir,
in that case i need to know about svn as well.


Thanks
Aman Tandon

On Fri, Apr 11, 2014 at 7:26 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 You can find the read-only Git's version of Lucene+Solr source code
 here: https://github.com/apache/lucene-solr . The SVN preference is
 Apache Foundation's choice and legacy. Most of the developers'
 workflows are also around SVN.

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Fri, Apr 11, 2014 at 7:48 AM, Aman Tandon amantandon...@gmail.com
 wrote:
  Hi,
 
  I am new here, i have question in mind that why we are preferring the svn
  more than git?
 
  --
  With Regards
  Aman Tandon

Re: multiple analyzers for one field

2014-04-10 Thread Trey Grainger

Hi Michael,

It IS possible to utilize multiple Analyzers within a single field, but
it's not a built in capability of Solr right now. I wrote something I
called a MultiTextField which provides this capability, and you can see
the code here:
https://github.com/treygrainger/solr-in-action/tree/master/src/main/java/sia/ch14

The general idea is that you can pass in a prefix for each piece of your
content and then use that prefix to dynamically select one or more
Analyzers for each piece of content. So, for example, you could pass in
something like this when indexing your document (for a multiValued field):
field name=someMultiTextFielden|some text/field
field name=someMultiTextFieldes|some more text/field
field name=someMultiTextFieldde,fr|some other text/field

Then, the MultiTextField will parse the prefixes and dynamically grab an
Analyzer based upon the prefix. In this case, the first input will be
processed using an English Analyzer, the second input will use a spanish
analyzer, and the third input will use both a German and French analyzer,
as defined when the field is defined in the schema.xml:

fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french,
   fr:text_german/

field name=someMultiTextField type=multiText indexed=true
multiValued=true /


If you want to automagically map separate fields into one of these dynamic
analyzer (MultiText) fields with prefixes, you could either pass the text
in multiple times from the client to the same field (with different
Analyzer prefixes each time like shown above), OR you could write an Update
Request Processor that does this for you. I don't think it is possible to
just have the copyField add in prefixes automatically for you, though
someone please correct me if I'm wrong.

If you implement an Update Request Processor, then inside it you would
simply grab the text from each of the relevant fields (i.e. author and
title fields) and then add that field's value to the named MultiText field
with the appropriate Analyzer prefix based upon each field. I made an
example Update Request Processor (see the previous github link and look for
MultiTextFieldLanguageIdentifierUpdateProcessor) that you could look at as
an example of how to supply different analyzer prefixes to different values
within a multiValued field, though you would obviously want to throw away
all the language detection stuff since it doesn't match your specific use
case.

All that being said, this solution may end up being overly complicated for
your use case, so your idea of creating a custom analyzer to just handle
your example might be much less complicated. At any rate, that's the
specific answer to your specific question about whether it is possible to
utilize multiple Analyzers within a field based upon multiple inputs.

All the best,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search  Analytics @ CareerBuilder


On Thu, Apr 10, 2014 at 9:05 PM, Michael Sokolov 
msoko...@safaribooksonline.com wrote:

 The lack of response to this question makes me think that either there is
 no good answer, or maybe the question was too obtuse.  So I'll give it one
 more go with some more detail ...

 My main goal is to implement autocompletion with a mix of words and short
 phrases, where the words are drawn from the text of largish documents, and
 the phrases are author names and document titles.

 I think the best way to accomplish this is to concoct a single field that
 contains data from these other source fields (as usual with copyField),
 but with some of the fields treated as keywords (ie with their values
 inserted as single tokens), and others tokenized.  I believe this would be
 possible at the Lucene level by calling Document.addField () with multiple
 fields having the same name: some marked as TOKENIZED and others not.  I
 think the tokenized fields would have to share the same analyzer, but
 that's OK for my case.

 I can't see how this could be made to happen in Solr without a lot of
 custom coding though. It seems as if the conversion from Solr fields to
 Lucene fields is not an easy thing to influence.  If anyone has an idea how
 to achieve the subgoal, or perhaps a different way of getting at the main
 goal, I'd love to hear about it.

 So far my only other idea is to write some kind of custom analyzer that
 treats short texts as keywords and tokenizes longer ones, which is probably
 what I'll look at if nothing else comes up.

 Thanks

 Mike



 On 4/9/2014 4:16 PM, Michael Sokolov wrote:

 I think I would like to do something like copyfield from a bunch of
 fields into a single field, but with different analysis for each source,
 and I'm pretty sure that's not a thing. Is there some alternate way to
 accomplish my goal?

 Which is to have a suggester that suggests

Re: multiple analyzers for one field

2014-04-10 Thread Michael Sokolov

Yes, I see - I could essentially do the tokenization myself (or using 
some Analyzer chain) in an Update Processor.  Yes I think that could 
work.  Thanks, Alex!


-Mike

On 4/10/14 10:09 PM, Alexandre Rafalovitch wrote:

It's an interesting question.

To start from, the copyField copies the source content, so there is no
source-related tokenization description. Only the target's one. So,
that approach is not suitable.

Regarding the lookups/auto-complete. There has been a bunch of various
implementations added recently, but they are not really documented.
Things like BlendedInfixSuggester are a bit hard to discover at the
moment. So, there might be something there if one digs a lot.

The other option is to do the tokenization in the
UpdateRequestProcessor chain. You could clone a field, and do some
processing so that by the time the content hits solr, it's already
pre-tokenized into multi-value field. Then, you could have
KeywordTokenizer on your collector field and separate URPs sub-chains
for each original fields that go into that. One related hack would be
to create a subclass of FieldMutatingUpdateProcessorFactory that wraps
an arbitrary tokenizer and splits out tokens as multi-value output.

This is a bit hazy, even in my own mind, but hopefully gives you
something new to think about.

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 11, 2014 at 8:05 AM, Michael Sokolov
msoko...@safaribooksonline.com wrote:

The lack of response to this question makes me think that either there is no
good answer, or maybe the question was too obtuse.  So I'll give it one more
go with some more detail ...

My main goal is to implement autocompletion with a mix of words and short
phrases, where the words are drawn from the text of largish documents, and
the phrases are author names and document titles.

I think the best way to accomplish this is to concoct a single field that
contains data from these other source fields (as usual with copyField),
but with some of the fields treated as keywords (ie with their values
inserted as single tokens), and others tokenized.  I believe this would be
possible at the Lucene level by calling Document.addField () with multiple
fields having the same name: some marked as TOKENIZED and others not.  I
think the tokenized fields would have to share the same analyzer, but that's
OK for my case.

I can't see how this could be made to happen in Solr without a lot of custom
coding though. It seems as if the conversion from Solr fields to Lucene
fields is not an easy thing to influence.  If anyone has an idea how to
achieve the subgoal, or perhaps a different way of getting at the main goal,
I'd love to hear about it.

So far my only other idea is to write some kind of custom analyzer that
treats short texts as keywords and tokenizes longer ones, which is probably
what I'll look at if nothing else comes up.

Thanks

Mike



On 4/9/2014 4:16 PM, Michael Sokolov wrote:

I think I would like to do something like copyfield from a bunch of fields
into a single field, but with different analysis for each source, and I'm
pretty sure that's not a thing. Is there some alternate way to accomplish my
goal?

Which is to have a suggester that suggests words from my full text field
and complete phrases drawn from my author and title fields all at the same
time.  So If I could index author and title using KeyWordAnalyzer, and full
text tokenized, that would be the bees knees.

-Mike

Re: multiple analyzers for one field

2014-04-10 Thread Michael Sokolov

Thanks for you detailed answer, Trey! I guess it helps to have just 
written that book :) By the way - I am eager to get it on our platform 
(safariflow.com -- but I think it hasn't arrived from Manning yet).


I had a half-baked idea about using a prefix like that. It did seem like 
it would be somewhat complicated, but certainly with your example code 
I'd have a leg up - thanks again.


-Mike

On 4/10/14 10:42 PM, Trey Grainger wrote:

Hi Michael,

It IS possible to utilize multiple Analyzers within a single field, but
it's not a built in capability of Solr right now. I wrote something I
called a MultiTextField which provides this capability, and you can see
the code here:
https://github.com/treygrainger/solr-in-action/tree/master/src/main/java/sia/ch14

The general idea is that you can pass in a prefix for each piece of your
content and then use that prefix to dynamically select one or more
Analyzers for each piece of content. So, for example, you could pass in
something like this when indexing your document (for a multiValued field):
field name=someMultiTextFielden|some text/field
field name=someMultiTextFieldes|some more text/field
field name=someMultiTextFieldde,fr|some other text/field

Then, the MultiTextField will parse the prefixes and dynamically grab an
Analyzer based upon the prefix. In this case, the first input will be
processed using an English Analyzer, the second input will use a spanish
analyzer, and the third input will use both a German and French analyzer,
as defined when the field is defined in the schema.xml:

fieldType name=multiText
 class=sia.ch14.MultiTextField sortMissingLast=true
 defaultFieldType=text_general
 fieldMappings=en:text_english,
es:text_spanish,
fr:text_french,
fr:text_german/

field name=someMultiTextField type=multiText indexed=true
multiValued=true /


If you want to automagically map separate fields into one of these dynamic
analyzer (MultiText) fields with prefixes, you could either pass the text
in multiple times from the client to the same field (with different
Analyzer prefixes each time like shown above), OR you could write an Update
Request Processor that does this for you. I don't think it is possible to
just have the copyField add in prefixes automatically for you, though
someone please correct me if I'm wrong.

If you implement an Update Request Processor, then inside it you would
simply grab the text from each of the relevant fields (i.e. author and
title fields) and then add that field's value to the named MultiText field
with the appropriate Analyzer prefix based upon each field. I made an
example Update Request Processor (see the previous github link and look for
MultiTextFieldLanguageIdentifierUpdateProcessor) that you could look at as
an example of how to supply different analyzer prefixes to different values
within a multiValued field, though you would obviously want to throw away
all the language detection stuff since it doesn't match your specific use
case.

All that being said, this solution may end up being overly complicated for
your use case, so your idea of creating a custom analyzer to just handle
your example might be much less complicated. At any rate, that's the
specific answer to your specific question about whether it is possible to
utilize multiple Analyzers within a field based upon multiple inputs.

All the best,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search  Analytics @ CareerBuilder


On Thu, Apr 10, 2014 at 9:05 PM, Michael Sokolov 
msoko...@safaribooksonline.com wrote:


The lack of response to this question makes me think that either there is
no good answer, or maybe the question was too obtuse.  So I'll give it one
more go with some more detail ...

My main goal is to implement autocompletion with a mix of words and short
phrases, where the words are drawn from the text of largish documents, and
the phrases are author names and document titles.

I think the best way to accomplish this is to concoct a single field that
contains data from these other source fields (as usual with copyField),
but with some of the fields treated as keywords (ie with their values
inserted as single tokens), and others tokenized.  I believe this would be
possible at the Lucene level by calling Document.addField () with multiple
fields having the same name: some marked as TOKENIZED and others not.  I
think the tokenized fields would have to share the same analyzer, but
that's OK for my case.

I can't see how this could be made to happen in Solr without a lot of
custom coding though. It seems as if the conversion from Solr fields to
Lucene fields is not an easy thing to influence.  If anyone has an idea how
to achieve the subgoal, or perhaps a different way of getting at the main
goal, I'd love to hear about it.

So far my only other idea is to write some kind of custom analyzer that
treats short texts as keywords and

Re: Relevance/Rank

Aman:

Oops, looked at the wrong part of the query, didn't see the bq clause.
You're right of course. Sorry for the misdirection.

Erick

Re: deleting large amount data from solr cloud

First, there is no master node, just leaders and replicas. But that's a nit.

No real clue why you would be going out of memory. Deleting a
document, even by query should just mark the docs as deleted, a pretty
low-cost operation.

how much memory are you giving the JVM?

Best,
Erick

On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis poth...@gmail.com wrote:
 [solr version 4.3.1]

 Hello,

 I have a solr cloud (4 nodes - 2 shards) with a fairly large amount
 documents (~360G of index per shard). Now, a major portion of the data is
 not required and I need to delete those documents. I would need to delete
 around 75% of the data.

 One of the solutions could be to drop the index completely re-index. But
 this is not an option at the moment.

 When we tried to delete the data through a query - say 1 day/month's worth
 of data. But after deleting just 1 month's worth of data, the master node
 is going out of memory - heap space.

 Wondering is there any way to incrementally delete the data without
 affecting the cluster adversely.

 Thank!
 Vinay

Re: Relevance/Rank

Its fine Erick, I am guessing that maybe* fq=(SKU:204-161)...  *this SKU
with that value is present in all results that's why Name products are not
getting boosted.

Ravi: check your results without filtering, does all the results
include *SKU:204-161.
*I guess this may help.


On Fri, Apr 11, 2014 at 9:22 AM, Erick Erickson erickerick...@gmail.comwrote:

 Aman:

 Oops, looked at the wrong part of the query, didn't see the bq clause.
 You're right of course. Sorry for the misdirection.

 Erick




-- 
With Regards
Aman Tandon

Re: Pushing content to Solr from Nutch

2014-04-10 Thread Jack Krupansky

Does your Solr schema match the data output by nutch? It’s up to you to create 
a Solr schema that matches the output of nutch – read up on the nutch doc for 
that info. Solr doesn’t define that info, nutch does.

-- Jack Krupansky

From: Xavier Morera 
Sent: Thursday, April 10, 2014 12:58 PM
To: solr-user@lucene.apache.org 
Subject: Pushing content to Solr from Nutch

Hi, 

I have followed several Nutch tutorials - including the main one 
http://wiki.apache.org/nutch/NutchTutorial - to crawl sites (which works, I can 
see in the console as the pages get crawled and the directories built with the 
data) but for the life of me I can't get anything posted to Solr. The Solr 
console doesn't even squint, therefore Nutch is not sending anything.

This is the command that I send over that crawls and in theory should also post
bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr 2

But I found that I could also use this one when it is already crawled
bin/nutch solrindex http://localhost:8983/solr crawl/crawldb crawl/linkdb 
crawl/segments/*

But no luck.

This is the only thing that called my attention but I read that by adding the 
property below would work but doesn't work.
No IndexWriters activated - check your configuration

This is the property
property
nameplugin.includes/name
valueprotocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)/value
/property

Any idea? Apache Nutch 1.8 running Java 1.6 via Cygwin on Windows.

-- 

Xavier Morera
email: xav...@familiamorera.com

CR: +(506) 8849 8866
US: +1 (305) 600 4919 
skype: xmorera

DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7

2014-04-10 Thread harshrossi

I am using *DeltaImportHandler* for indexing data in Solr. Currently I am
manually indexing the data into Solr by selecting commands full-import or
delta-import from the Solr Admin screen.

I am using Windows 7 and would like to automate the process by specifying a
certain time interval for executing the commands through windows task
scheduler or something. e.g.: like every two minutes it should index data
into solr.

From few sites I came to know that I need to create a *batch file* with some
command to run the imports and the batch file is run using *windows
scheduler*. But there were no examples regarding this.

I am not sure what to code in the batch file and how to link it with the
scheduler.

Can someone provide me the code and the steps to accomplish it?

Thanks a lot in advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-Automatic-scheduling-of-delta-imports-in-Solr-in-windows-7-tp4130565.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7