Re: Solr groups not matching with terms in a field

2015-01-16 Thread Naresh Yadav
Hi ahmet,

Thanks, now i understand better, i will not try my usecase with grouping.
Actually i am interested in unique terms in a field i.e tenant_pool. That i
get perfectly with http://www.imagesup.net/?di=614212438580

But i am not able to get terms after applying some filter say type:1.
That is I need unique terms in tenant_pool field for type:1 query and
answer will be P1, L1.
Please suggest me if i can get this with out reading each doc from disk.

On Fri, Jan 16, 2015 at 1:28 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Naresh,

 I have never grouped on a tokenised field and I am not sure it makes sense
 to do so.

 Reading back ref-guide it says this about group.field parameter

 The name of the field by which to group results. The field must be
 single-valued, and either be indexed or a field type that has a value
 source and works in a function query, such as ExternalFileField. It must
 also be a string-based field, such as StrField or TextField


 https://cwiki.apache.org/confluence/display/solr/Result+Grouping

 Therefore, it should be single valued. P.S. Don't get confused with
 TextField type, for example it could create single token when used with
 keyword tokenizer.

 Ahmet

 On Friday, January 16, 2015 4:43 AM, Naresh Yadav nyadav@gmail.com
 wrote:
 Hi ahmet,

 If you observe output ngroups is 1 and returning only one group P1.
 But my expectation is it should return three groups P1, L1, L2 as my
 field is tokenized with space.

 Please correct me if wrong?


 On 1/15/15, Ahmet Arslan iori...@yahoo.com.invalid wrote:
 
 
  Hi Naresh,
 
  Everything looks correct, what is the problem here?
 
  If you want to see more than one document per group, there is a parameter
  for that which defaults to 1.
 
  Ahmet
 
 
 
  On Thursday, January 15, 2015 9:02 AM, Naresh Yadav 
 nyadav@gmail.com
  wrote:
  Hi all,
 
  I had done following configuration to test Solr grouping concept.
 
  solr version :  4.6.1 (tried in latest version 4.10.3 also)
  Schema : http://www.imagesup.net/?di=10142124357616
  Solrj code to insert docs :
 http://www.imagesup.net/?di=10142124381116
  Response Group's :  http://www.imagesup.net/?di=1114212438351
  Response Terms' : http://www.imagesup.net/?di=614212438580
 
  Please let me know if am i doing something wrong her



Re: Solr groups not matching with terms in a field

2015-01-16 Thread Ahmet Arslan
Hi Naresh,

Yup terms component does not respect q or fq parameter. 
Luckily, thats easy with facet component. Example : 
facet=truefacet.field=tenant_poolq=type:1

Please see more here : https://cwiki.apache.org/confluence/display/solr/Faceting

happy faceting,
ahmet



On Friday, January 16, 2015 10:13 AM, Naresh Yadav nyadav@gmail.com wrote:
Hi ahmet,

Thanks, now i understand better, i will not try my usecase with grouping.
Actually i am interested in unique terms in a field i.e tenant_pool. That i
get perfectly with http://www.imagesup.net/?di=614212438580

But i am not able to get terms after applying some filter say type:1.
That is I need unique terms in tenant_pool field for type:1 query and
answer will be P1, L1.
Please suggest me if i can get this with out reading each doc from disk.


On Fri, Jan 16, 2015 at 1:28 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Naresh,

 I have never grouped on a tokenised field and I am not sure it makes sense
 to do so.

 Reading back ref-guide it says this about group.field parameter

 The name of the field by which to group results. The field must be
 single-valued, and either be indexed or a field type that has a value
 source and works in a function query, such as ExternalFileField. It must
 also be a string-based field, such as StrField or TextField


 https://cwiki.apache.org/confluence/display/solr/Result+Grouping

 Therefore, it should be single valued. P.S. Don't get confused with
 TextField type, for example it could create single token when used with
 keyword tokenizer.

 Ahmet

 On Friday, January 16, 2015 4:43 AM, Naresh Yadav nyadav@gmail.com
 wrote:
 Hi ahmet,

 If you observe output ngroups is 1 and returning only one group P1.
 But my expectation is it should return three groups P1, L1, L2 as my
 field is tokenized with space.

 Please correct me if wrong?


 On 1/15/15, Ahmet Arslan iori...@yahoo.com.invalid wrote:
 
 
  Hi Naresh,
 
  Everything looks correct, what is the problem here?
 
  If you want to see more than one document per group, there is a parameter
  for that which defaults to 1.
 
  Ahmet
 
 
 
  On Thursday, January 15, 2015 9:02 AM, Naresh Yadav 
 nyadav@gmail.com
  wrote:
  Hi all,
 
  I had done following configuration to test Solr grouping concept.
 
  solr version :  4.6.1 (tried in latest version 4.10.3 also)
  Schema : http://www.imagesup.net/?di=10142124357616
  Solrj code to insert docs :
 http://www.imagesup.net/?di=10142124381116
  Response Group's :  http://www.imagesup.net/?di=1114212438351
  Response Terms' : http://www.imagesup.net/?di=614212438580
 
  Please let me know if am i doing something wrong her



Re: OutOfMemoryError for PDF document upload into Solr

2015-01-16 Thread Siegfried Goeschl

Hi Dan,

neat idea - made a mental note :-)

That brings us back to the point that in complex setups you should not 
do the document pre-processing directly in SOLR but have an import 
process which can safely crash when processing a 4GB PDF file


Cheers,

Siegfried Goeschl

On 16.01.15 05:02, Dan Davis wrote:

Why re-write all the document conversion in Java ;)  Tika is very slow.   5
GB PDF is very big.

If you have a lot of PDF like that try pdftotext in HTML and UTF-8 output
mode.   The HTML mode captures some meta-data that would otherwise be lost.


If you need to go faster still, you can  also write some stuff linked
directly against poppler library.

Before you jump down by through about Tika being slow - I wrote a PDF
indexer that ran at 36 MB/s per core.   Different indexer, all C, lots of
getjmp/longjmp.   But fast...



On Thu, Jan 15, 2015 at 1:54 PM, ganesh.ya...@sungard.com wrote:


Siegfried and Michael Thank you for your replies and help.

-Original Message-
From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
Sent: Thursday, January 15, 2015 3:45 AM
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemoryError for PDF document upload into Solr

Hi Ganesh,

you can increase the heap size but parsing a 4 GB PDF document will very
likely consume A LOT OF memory - I think you need to check if that large
PDF can be parsed at all :-)

Cheers,

Siegfried Goeschl

On 14.01.15 18:04, Michael Della Bitta wrote:

Yep, you'll have to increase the heap size for your Tomcat container.

http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial
-heap-size-correctly

Michael Della Bitta

Senior Software Engineer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/11200277628550959
3336/posts
w: appinions.com http://www.appinions.com/

On Wed, Jan 14, 2015 at 12:00 PM, ganesh.ya...@sungard.com wrote:


Hello,

Can someone pass on the hints to get around following error? Is there
any Heap Size parameter I can set in Tomcat or in Solr webApp that
gets deployed in Solr?

I am running Solr webapp inside Tomcat on my local machine which has
RAM of 12 GB. I have PDF document which is 4 GB max in size that
needs to be loaded into Solr




Exception in thread http-apr-8983-exec-6 java.lang.: Java heap

space

  at java.util.AbstractCollection.toArray(Unknown Source)
  at java.util.ArrayList.init(Unknown Source)
  at
org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
  at

org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575)

  at

org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254)

  at

org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)

  at

org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)

  at

org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)

  at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
  at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
  at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
  at


org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)

  at


org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)

  at


org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

  at


org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)

  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
  at


org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)

  at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)

  at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)

  at


org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)

  at


org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)

  at


org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)

  at


org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)

  at


org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)

  at


org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)

  at


org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)

  at


org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)

  at



Re: How to select the correct number of Shards in SolrCloud

2015-01-16 Thread Daniel Collins
Sharding a query lets you parallel the actual querying the index part of
the search. But remember that as soon as you spread the query out more, you
also need to bring all 64 results sets back together and consolidate them
into a single result set for the end user.  At some point, the gain of
being able to search the data quicker is outweighed by the cost of this
consolidation activity.

One other point to mention, which we noticed as a by-product of some
large-scale sharding we were testing (256 shards, no caches, whole
different kettle of fish!)

The resulting query is only as fast as the slowest shard.  If you have 64
shards, and 8 shards/cores per machine, how many JVMs are you running per
machine?  If you have a single JVM with 8 cores in it, then remember as
soon as that JVM enters a GC cycle, all those 8 cores will stall
processing.  If you have a query and it needs to get results from 64 cores,
if 63 return in 100ms but the last core is in GC pause and takes 500ms,
your query will take just over 500ms.

With respect to sharding, I would never start with a large number of shards
(and 64 is reasonably large in Solr terms). You might be able to get away
without sharding at all, if that meets your latency requirements, then why
bother with the complexity of sharding?  Use those extra CPUs for
processing more QPS instead of a single query faster?

Lastly, you mentioned you allocated 32Gb to solr, do you mean to the JVM
heap?  That's quite a lot of a 64Gb machine, you haven't left much for the
page cache.  The general rule for Solr is to make the JVM heap as small as
you can get away with, to let the OS page cache (which is needed to cache
all the index files) with as much memory as possible.

On 16 January 2015 at 05:58, Manohar Sripada manohar...@gmail.com wrote:

 Hi All,

 My Setup is as follows. There are 16 nodes in my SolrCloud and 4 CPU cores
 on each Solr Node VM. Each having 64 GB of RAM, out of which I have
 allocated 32 GB to Solr. I have a collection which contains around 100
 million Docs, which I created with 64 shards, replication factor 2, and 8
 shards per node. Each shard is getting around 1.6 Million Documents.

 The reason I have created 64 Shards is there are 4 CPU cores on each VM;
 while querying I can make use of all the CPU cores. On an average, Solr
 QTime is around 500ms here.

 Last time to my other discussion, Erick suggested that I might be over
 sharding, So, I tried reducing the number of shards to 32 and then 16. To
 my surprise, it started performing better. It came down to 300 ms (for 32
 shards) and 100 ms (for 16 shards). I haven't tested with filters and
 facets yet here. But, the simple search queries had shown lot of
 improvement.

 So, how come the less number of shards performing better?? Is it because
 there are less number of posting lists to search on OR less merges that are
 happening? And how to determine the correct number of shards?

 Thanks,
 Manohar



Re: Solr groups not matching with terms in a field

2015-01-16 Thread Naresh Yadav
I tried facetting also but not worked smoothly for me. Case i had mentioned
in email is dummy one and my actual index is with
12 lakh docs and 2 GB size on single machine. Each of tenant_pool field
value has 20-30 tokens.
Getting all terms in tenant_pool is fast in seconds but when i go with
facet path after filter criteria then that is very slow. Because
it is reading whole field from disk and i am only interested in terms.

On Fri, Jan 16, 2015 at 1:48 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Naresh,

 Yup terms component does not respect q or fq parameter.
 Luckily, thats easy with facet component. Example :
 facet=truefacet.field=tenant_poolq=type:1

 Please see more here :
 https://cwiki.apache.org/confluence/display/solr/Faceting

 happy faceting,
 ahmet



 On Friday, January 16, 2015 10:13 AM, Naresh Yadav nyadav@gmail.com
 wrote:
 Hi ahmet,

 Thanks, now i understand better, i will not try my usecase with grouping.
 Actually i am interested in unique terms in a field i.e tenant_pool. That i
 get perfectly with http://www.imagesup.net/?di=614212438580

 But i am not able to get terms after applying some filter say type:1.
 That is I need unique terms in tenant_pool field for type:1 query and
 answer will be P1, L1.
 Please suggest me if i can get this with out reading each doc from disk.


 On Fri, Jan 16, 2015 at 1:28 PM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

  Hi Naresh,
 
  I have never grouped on a tokenised field and I am not sure it makes
 sense
  to do so.
 
  Reading back ref-guide it says this about group.field parameter
 
  The name of the field by which to group results. The field must be
  single-valued, and either be indexed or a field type that has a value
  source and works in a function query, such as ExternalFileField. It must
  also be a string-based field, such as StrField or TextField
 
 
  https://cwiki.apache.org/confluence/display/solr/Result+Grouping
 
  Therefore, it should be single valued. P.S. Don't get confused with
  TextField type, for example it could create single token when used with
  keyword tokenizer.
 
  Ahmet
 
  On Friday, January 16, 2015 4:43 AM, Naresh Yadav nyadav@gmail.com
  wrote:
  Hi ahmet,
 
  If you observe output ngroups is 1 and returning only one group P1.
  But my expectation is it should return three groups P1, L1, L2 as my
  field is tokenized with space.
 
  Please correct me if wrong?
 
 
  On 1/15/15, Ahmet Arslan iori...@yahoo.com.invalid wrote:
  
  
   Hi Naresh,
  
   Everything looks correct, what is the problem here?
  
   If you want to see more than one document per group, there is a
 parameter
   for that which defaults to 1.
  
   Ahmet
  
  
  
   On Thursday, January 15, 2015 9:02 AM, Naresh Yadav 
  nyadav@gmail.com
   wrote:
   Hi all,
  
   I had done following configuration to test Solr grouping concept.
  
   solr version :  4.6.1 (tried in latest version 4.10.3 also)
   Schema : http://www.imagesup.net/?di=10142124357616
   Solrj code to insert docs :
  http://www.imagesup.net/?di=10142124381116
   Response Group's :  http://www.imagesup.net/?di=1114212438351
   Response Terms' : http://www.imagesup.net/?di=614212438580
  
   Please let me know if am i doing something wrong her
 



Re: How to select the correct number of Shards in SolrCloud

2015-01-16 Thread Shawn Heisey
On 1/15/2015 10:58 PM, Manohar Sripada wrote:
 The reason I have created 64 Shards is there are 4 CPU cores on each VM;
 while querying I can make use of all the CPU cores. On an average, Solr
 QTime is around 500ms here.
 
 Last time to my other discussion, Erick suggested that I might be over
 sharding, So, I tried reducing the number of shards to 32 and then 16. To
 my surprise, it started performing better. It came down to 300 ms (for 32
 shards) and 100 ms (for 16 shards). I haven't tested with filters and
 facets yet here. But, the simple search queries had shown lot of
 improvement.
 
 So, how come the less number of shards performing better?? Is it because
 there are less number of posting lists to search on OR less merges that are
 happening? And how to determine the correct number of shards?

Daniel has replied with good information.

One additional problem I can think of when there are too many shards: If
your Solr server is busy enough to have any possibility of simultaneous
requests, then you will find that it's NOT a good idea to create enough
shards to use all your CPU cores.  In that situation, when you do a
single query, all your CPU cores will be in use.  When multiple queries
happen at the same time, they have to share the available CPU resources,
slowing them down.  With a smaller number of shards, the additional CPU
cores can handle simultaneous queries.

I have an index with nearly 100 million documents.  I've divided it into
six large cold shards and one very small hot shard.  It's not SolrCloud.
 I put three large shards on each of two servers, and the small shard on
one of those two servers.  The distributed query normally happens on the
server without the small shard.  Each server has 8 CPU cores and 64GB of
RAM.  Solr requires a 6GB heap.

My median QTime over the last 231836 queries is 25 milliseconds and my
95th percentile QTime is 376 milliseconds.  My query rate is pretty low
- I've never seen Solr's statistics for the 15 minute query rate go
above a single digit per second.

Thanks,
Shawn



Re: Solr groups not matching with terms in a field

2015-01-16 Thread Ahmet Arslan
Hi,

Thats a different problem : speed-up faceting.
Faceting used all over the place and it is fast. I suggest you looks for 
faceting improvements.

Ahmet



On Friday, January 16, 2015 11:17 AM, Naresh Yadav nyadav@gmail.com wrote:
I tried facetting also but not worked smoothly for me. Case i had mentioned
in email is dummy one and my actual index is with
12 lakh docs and 2 GB size on single machine. Each of tenant_pool field
value has 20-30 tokens.
Getting all terms in tenant_pool is fast in seconds but when i go with
facet path after filter criteria then that is very slow. Because
it is reading whole field from disk and i am only interested in terms.


On Fri, Jan 16, 2015 at 1:48 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Naresh,

 Yup terms component does not respect q or fq parameter.
 Luckily, thats easy with facet component. Example :
 facet=truefacet.field=tenant_poolq=type:1

 Please see more here :
 https://cwiki.apache.org/confluence/display/solr/Faceting

 happy faceting,
 ahmet



 On Friday, January 16, 2015 10:13 AM, Naresh Yadav nyadav@gmail.com
 wrote:
 Hi ahmet,

 Thanks, now i understand better, i will not try my usecase with grouping.
 Actually i am interested in unique terms in a field i.e tenant_pool. That i
 get perfectly with http://www.imagesup.net/?di=614212438580

 But i am not able to get terms after applying some filter say type:1.
 That is I need unique terms in tenant_pool field for type:1 query and
 answer will be P1, L1.
 Please suggest me if i can get this with out reading each doc from disk.


 On Fri, Jan 16, 2015 at 1:28 PM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

  Hi Naresh,
 
  I have never grouped on a tokenised field and I am not sure it makes
 sense
  to do so.
 
  Reading back ref-guide it says this about group.field parameter
 
  The name of the field by which to group results. The field must be
  single-valued, and either be indexed or a field type that has a value
  source and works in a function query, such as ExternalFileField. It must
  also be a string-based field, such as StrField or TextField
 
 
  https://cwiki.apache.org/confluence/display/solr/Result+Grouping
 
  Therefore, it should be single valued. P.S. Don't get confused with
  TextField type, for example it could create single token when used with
  keyword tokenizer.
 
  Ahmet
 
  On Friday, January 16, 2015 4:43 AM, Naresh Yadav nyadav@gmail.com
  wrote:
  Hi ahmet,
 
  If you observe output ngroups is 1 and returning only one group P1.
  But my expectation is it should return three groups P1, L1, L2 as my
  field is tokenized with space.
 
  Please correct me if wrong?
 
 
  On 1/15/15, Ahmet Arslan iori...@yahoo.com.invalid wrote:
  
  
   Hi Naresh,
  
   Everything looks correct, what is the problem here?
  
   If you want to see more than one document per group, there is a
 parameter
   for that which defaults to 1.
  
   Ahmet
  
  
  
   On Thursday, January 15, 2015 9:02 AM, Naresh Yadav 
  nyadav@gmail.com
   wrote:
   Hi all,
  
   I had done following configuration to test Solr grouping concept.
  
   solr version :  4.6.1 (tried in latest version 4.10.3 also)
   Schema : http://www.imagesup.net/?di=10142124357616
   Solrj code to insert docs :
  http://www.imagesup.net/?di=10142124381116
   Response Group's :  http://www.imagesup.net/?di=1114212438351
   Response Terms' : http://www.imagesup.net/?di=614212438580
  
   Please let me know if am i doing something wrong her
 



Re: Easiest way to embed solr in a desktop application

2015-01-16 Thread Ramkumar R. Aiyengar
That's correct, even though it should still be possible to embed Jetty,
that could change in the future, and that's why support for pluggable
containers is being taken away.

If you need to deal with the index at a lower level, there's always Lucene
you can use as a library instead of Solr.

But I am assuming you need to use the search engine at a higher level than
that and hence you ask for Solr. In which case, I urge you to think through
if you really can't run this out of process, may be this is an XY problem.
Keep in mind that Solr has the ability to provide higher level
functionality because it can control almost the entirety of the application
(which is the philosophical reason behind removal of the war as well), and
that's the reason something like EmbeddedSolrServer will always have
caveats.
On 15 Jan 2015 15:09, Robert Krüger krue...@lesspain.de wrote:

 I was considering the programmatic Jetty option but then I read that Solr 5
 no longer supports being run with an external servlet container but maybe
 they still support programmatic jetty use in some way. atm I am using solr
 4.x, so this would work. No idea if this gets messy classloader-wise in any
 way.

 I have been using exactly the approach you described in the past, i.e. I
 built a really, really simple swing dialogue to input queries and display
 results in a table but was just guessing that the built-in ui was far
 superior but maybe I should just live with it for the time being.

 On Thu, Jan 15, 2015 at 3:56 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:

  It’d certainly be easiest to just embed Jetty into your application.  You
  don’t need to have Jetty as a separate process, you could launch it
 through
  it’s friendly Java API, configured to use solr.war.
 
  If all you needed was to make HTTP(-like) queries to Solr instead of the
  full admin UI, your application could stick to using EmbeddedSolrServer
 and
  also provide a UI that takes in a Solr query string (or builds one up)
 and
  then sends it to the embedded Solr and displays the result.
 
  Erik
 
   On Jan 15, 2015, at 9:44 AM, Robert Krüger krue...@lesspain.de
 wrote:
  
   Hi Andrea,
  
   you are assuming correctly. It is a local, non-distributed index that
 is
   only accessed by the containing desktop application. Do you know if
 there
   is a possibility to run the Solr admin UI on top of an embedded
 instance
   somehow?
  
   Thanks a lot,
  
   Robert
  
   On Thu, Jan 15, 2015 at 3:17 PM, Andrea Gazzarini 
 a.gazzar...@gmail.com
  
   wrote:
  
   Hi Robert,
   I've used the EmbeddedSolrServer in a scenario like that and I never
 had
   problems.
   I assume you're talking about a standalone application, where the
 whole
   index resides locally and you don't need any cluster / cloud /
  distributed
   feature.
  
   I think the usage of EmbeddedSolrServer is discouraged in a
  (distributed)
   service scenario, because it is a direct connection to a SolrCore
   instance...but this is not a problem in the situation you described
 (as
  far
   as I know)
  
   Best,
   Andrea
  
  
   On 01/15/2015 03:10 PM, Robert Krüger wrote:
  
   Hi,
  
   I have been using an embedded instance of solr in my desktop
  application
   for a long time and it works fine. At the time when I made that
  decision
   (vs. firing up a solr web application within my swing application) I
  got
   the impression embedded use is somewhat unsupported and I should
 expect
   problems.
  
   My first question is, is this still the case now (4 years later),
 that
   embedded solr is discouraged?
  
   The one limitation I am running into is that I cannot use the solr
  admin
   UI
   for debugging purposes (mainly for running queries). Is there any
 other
   way
   to do this other than no longer using embedded solr and
  programmatically
   firing up a web application (e.g. using jetty)? Should I do the
 latter
   anyway?
  
   Any insights/advice greatly appreciated.
  
   Best regards,
  
   Robert
  
  
  
  
  
   --
   Robert Krüger
   Managing Partner
   Lesspain GmbH  Co. KG
  
   www.lesspain-software.com
 
 


 --
 Robert Krüger
 Managing Partner
 Lesspain GmbH  Co. KG

 www.lesspain-software.com



Re: Solr groups not matching with terms in a field

2015-01-16 Thread Naresh Yadav
thanks Ahmet..my problem solved...reason of slow performance of facet query
was : not doing setRows(0)..
once i done it then it came out in seconds like terms query.

On Fri, Jan 16, 2015 at 3:25 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi,

 Thats a different problem : speed-up faceting.
 Faceting used all over the place and it is fast. I suggest you looks for
 faceting improvements.

 Ahmet



 On Friday, January 16, 2015 11:17 AM, Naresh Yadav nyadav@gmail.com
 wrote:
 I tried facetting also but not worked smoothly for me. Case i had mentioned
 in email is dummy one and my actual index is with
 12 lakh docs and 2 GB size on single machine. Each of tenant_pool field
 value has 20-30 tokens.
 Getting all terms in tenant_pool is fast in seconds but when i go with
 facet path after filter criteria then that is very slow. Because
 it is reading whole field from disk and i am only interested in terms.


 On Fri, Jan 16, 2015 at 1:48 PM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

  Hi Naresh,
 
  Yup terms component does not respect q or fq parameter.
  Luckily, thats easy with facet component. Example :
  facet=truefacet.field=tenant_poolq=type:1
 
  Please see more here :
  https://cwiki.apache.org/confluence/display/solr/Faceting
 
  happy faceting,
  ahmet
 
 
 
  On Friday, January 16, 2015 10:13 AM, Naresh Yadav nyadav@gmail.com
 
  wrote:
  Hi ahmet,
 
  Thanks, now i understand better, i will not try my usecase with grouping.
  Actually i am interested in unique terms in a field i.e tenant_pool.
 That i
  get perfectly with http://www.imagesup.net/?di=614212438580
 
  But i am not able to get terms after applying some filter say type:1.
  That is I need unique terms in tenant_pool field for type:1 query
 and
  answer will be P1, L1.
  Please suggest me if i can get this with out reading each doc from disk.
 
 
  On Fri, Jan 16, 2015 at 1:28 PM, Ahmet Arslan iori...@yahoo.com.invalid
 
  wrote:
 
   Hi Naresh,
  
   I have never grouped on a tokenised field and I am not sure it makes
  sense
   to do so.
  
   Reading back ref-guide it says this about group.field parameter
  
   The name of the field by which to group results. The field must be
   single-valued, and either be indexed or a field type that has a value
   source and works in a function query, such as ExternalFileField. It
 must
   also be a string-based field, such as StrField or TextField
  
  
   https://cwiki.apache.org/confluence/display/solr/Result+Grouping
  
   Therefore, it should be single valued. P.S. Don't get confused with
   TextField type, for example it could create single token when used with
   keyword tokenizer.
  
   Ahmet
  
   On Friday, January 16, 2015 4:43 AM, Naresh Yadav 
 nyadav@gmail.com
   wrote:
   Hi ahmet,
  
   If you observe output ngroups is 1 and returning only one group P1.
   But my expectation is it should return three groups P1, L1, L2 as my
   field is tokenized with space.
  
   Please correct me if wrong?
  
  
   On 1/15/15, Ahmet Arslan iori...@yahoo.com.invalid wrote:
   
   
Hi Naresh,
   
Everything looks correct, what is the problem here?
   
If you want to see more than one document per group, there is a
  parameter
for that which defaults to 1.
   
Ahmet
   
   
   
On Thursday, January 15, 2015 9:02 AM, Naresh Yadav 
   nyadav@gmail.com
wrote:
Hi all,
   
I had done following configuration to test Solr grouping concept.
   
solr version :  4.6.1 (tried in latest version 4.10.3 also)
Schema : http://www.imagesup.net/?di=10142124357616
Solrj code to insert docs :
   http://www.imagesup.net/?di=10142124381116
Response Group's :  http://www.imagesup.net/?di=1114212438351
Response Terms' : http://www.imagesup.net/?di=614212438580
   
Please let me know if am i doing something wrong her
  
 



Re: How to select the correct number of Shards in SolrCloud

2015-01-16 Thread Manohar Sripada
Thanks Daniel and Shawn for your valuable suggestions,

Daniel,
If you have a query and it needs to get results from 64 cores, if 63 return
in 100ms but the last core is in GC pause and takes 500ms, your query will
take just over 500ms.
 There is only single JVM running per machine. I will get the QTime from
each Solr Core and will check if this is the root cause.

Lastly, you mentioned you allocated 32Gb to solr, do you mean to the
JVM heap?
That's quite a lot of a 64Gb machine, you haven't left much for the page
cache.
 Yes, 32GB to Solr's JVM heap. I wanted to enable Filter  FieldValue
Cache, as most of my search queries revolves around filters and facets.
Also, I am planning to use Document cache.

Shawn,
Each server has 8 CPU cores and 64GB of RAM.  Solr requires a 6GB heap
 Can you please tell me what is the size of your index? And what is the
size of the large cold shard?
 Can you please suggest if any tool that you use for collecting the
statistics? like the QTime's for the queries etc.

Thanks,
Manohar


On Fri, Jan 16, 2015 at 3:23 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 1/15/2015 10:58 PM, Manohar Sripada wrote:
  The reason I have created 64 Shards is there are 4 CPU cores on each VM;
  while querying I can make use of all the CPU cores. On an average, Solr
  QTime is around 500ms here.
 
  Last time to my other discussion, Erick suggested that I might be over
  sharding, So, I tried reducing the number of shards to 32 and then 16. To
  my surprise, it started performing better. It came down to 300 ms (for 32
  shards) and 100 ms (for 16 shards). I haven't tested with filters and
  facets yet here. But, the simple search queries had shown lot of
  improvement.
 
  So, how come the less number of shards performing better?? Is it because
  there are less number of posting lists to search on OR less merges that
 are
  happening? And how to determine the correct number of shards?

 Daniel has replied with good information.

 One additional problem I can think of when there are too many shards: If
 your Solr server is busy enough to have any possibility of simultaneous
 requests, then you will find that it's NOT a good idea to create enough
 shards to use all your CPU cores.  In that situation, when you do a
 single query, all your CPU cores will be in use.  When multiple queries
 happen at the same time, they have to share the available CPU resources,
 slowing them down.  With a smaller number of shards, the additional CPU
 cores can handle simultaneous queries.

 I have an index with nearly 100 million documents.  I've divided it into
 six large cold shards and one very small hot shard.  It's not SolrCloud.
  I put three large shards on each of two servers, and the small shard on
 one of those two servers.  The distributed query normally happens on the
 server without the small shard.  Each server has 8 CPU cores and 64GB of
 RAM.  Solr requires a 6GB heap.

 My median QTime over the last 231836 queries is 25 milliseconds and my
 95th percentile QTime is 376 milliseconds.  My query rate is pretty low
 - I've never seen Solr's statistics for the 15 minute query rate go
 above a single digit per second.

 Thanks,
 Shawn




Apache Solr quickstart tutorial - error while loading main class SimplePostTool

2015-01-16 Thread Shubhanshu Gupta
I am following Apache Solr quickstart tutorial
http://lucene.apache.org/solr/quickstart.html. The tutorial comes across
indexing a directory of rich files which requires implementing java -Dauto
-Drecursive org.apache.solr.util.SimplePostTool docs/ .

I am getting an error which says: Could not find or load main class
org.apache.solr.util.SimplePostTool inspite of following the quickstart
tutorial closely. I am not getting how to resolve the error and proceed
ahead with the tutorial.
I would whole-heartedly appreciate any help. Thanks in advance.


Regards,
Shubhanshu Gupta

LinkedIn
https://www.linkedin.com/profile/view?id=310143808snapshotID=0authType=nameauthToken=cDRNtrk=NUS-body-member-namesl=NPU_REG%3Bno_results%3B-1%3Bactivity%3A5903287270026268672%3B
|
Twitter https://twitter.com/Shubhanshugupta


Re: Apache Solr quickstart tutorial - error while loading main class SimplePostTool

2015-01-16 Thread Ahmet Arslan
Hi Shubhanshu,

How about this one?

java -classpath dist/solr-core-*jar -Dauto -Drecursive 
org.apache.solr.util.SimplePostTool docs/

Ahmet


On Friday, January 16, 2015 3:13 PM, Shubhanshu Gupta 
shubhanshu.gupt...@gmail.com wrote:
I am following Apache Solr quickstart tutorial
http://lucene.apache.org/solr/quickstart.html. The tutorial comes across
indexing a directory of rich files which requires implementing java -Dauto
-Drecursive org.apache.solr.util.SimplePostTool docs/ .

I am getting an error which says: Could not find or load main class
org.apache.solr.util.SimplePostTool inspite of following the quickstart
tutorial closely. I am not getting how to resolve the error and proceed
ahead with the tutorial.
I would whole-heartedly appreciate any help. Thanks in advance.


Regards,
Shubhanshu Gupta

LinkedIn
https://www.linkedin.com/profile/view?id=310143808snapshotID=0authType=nameauthToken=cDRNtrk=NUS-body-member-namesl=NPU_REG%3Bno_results%3B-1%3Bactivity%3A5903287270026268672%3B
|
Twitter https://twitter.com/Shubhanshugupta


Re: OutOfMemoryError for PDF document upload into Solr

2015-01-16 Thread Charlie Hull

On 16/01/2015 04:02, Dan Davis wrote:

Why re-write all the document conversion in Java ;)  Tika is very slow.   5
GB PDF is very big.


Or you can run Tika in a separate process, or even on a separate 
machine, wrapped with something to cope if it dies due to some horrible 
input...we generally avoid document format translation within Solr and 
do it externally before feeding documents to Solr.


Charlie


If you have a lot of PDF like that try pdftotext in HTML and UTF-8 output
mode.   The HTML mode captures some meta-data that would otherwise be lost.


If you need to go faster still, you can  also write some stuff linked
directly against poppler library.

Before you jump down by through about Tika being slow - I wrote a PDF
indexer that ran at 36 MB/s per core.   Different indexer, all C, lots of
getjmp/longjmp.   But fast...



On Thu, Jan 15, 2015 at 1:54 PM, ganesh.ya...@sungard.com wrote:


Siegfried and Michael Thank you for your replies and help.

-Original Message-
From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
Sent: Thursday, January 15, 2015 3:45 AM
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemoryError for PDF document upload into Solr

Hi Ganesh,

you can increase the heap size but parsing a 4 GB PDF document will very
likely consume A LOT OF memory - I think you need to check if that large
PDF can be parsed at all :-)

Cheers,

Siegfried Goeschl

On 14.01.15 18:04, Michael Della Bitta wrote:

Yep, you'll have to increase the heap size for your Tomcat container.

http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial
-heap-size-correctly

Michael Della Bitta

Senior Software Engineer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/11200277628550959
3336/posts
w: appinions.com http://www.appinions.com/

On Wed, Jan 14, 2015 at 12:00 PM, ganesh.ya...@sungard.com wrote:


Hello,

Can someone pass on the hints to get around following error? Is there
any Heap Size parameter I can set in Tomcat or in Solr webApp that
gets deployed in Solr?

I am running Solr webapp inside Tomcat on my local machine which has
RAM of 12 GB. I have PDF document which is 4 GB max in size that
needs to be loaded into Solr




Exception in thread http-apr-8983-exec-6 java.lang.: Java heap

space

  at java.util.AbstractCollection.toArray(Unknown Source)
  at java.util.ArrayList.init(Unknown Source)
  at
org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
  at

org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575)

  at

org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254)

  at

org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)

  at

org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)

  at

org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)

  at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
  at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
  at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
  at


org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)

  at


org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)

  at


org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

  at


org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)

  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
  at


org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)

  at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)

  at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)

  at


org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)

  at


org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)

  at


org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)

  at


org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)

  at


org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)

  at


org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)

  at


org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)

  at


org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)

  at



Re: OutOfMemoryError for PDF document upload into Solr

2015-01-16 Thread Markus Jelsma
Tika 1.6 has PDFBox 1.8.4, which has memory issues, eating excessive RAM! 
Either upgrade to Tika 1.7 (out now) or manually use the PDFBox 1.8.8 
dependency.

M.

On Friday 16 January 2015 15:21:55 Charlie Hull wrote:
 On 16/01/2015 04:02, Dan Davis wrote:
  Why re-write all the document conversion in Java ;)  Tika is very slow.  
  5
  GB PDF is very big.
 
 Or you can run Tika in a separate process, or even on a separate
 machine, wrapped with something to cope if it dies due to some horrible
 input...we generally avoid document format translation within Solr and
 do it externally before feeding documents to Solr.
 
 Charlie
 
  If you have a lot of PDF like that try pdftotext in HTML and UTF-8 output
  mode.   The HTML mode captures some meta-data that would otherwise be
  lost.
  
  
  If you need to go faster still, you can  also write some stuff linked
  directly against poppler library.
  
  Before you jump down by through about Tika being slow - I wrote a PDF
  indexer that ran at 36 MB/s per core.   Different indexer, all C, lots of
  getjmp/longjmp.   But fast...
  
  On Thu, Jan 15, 2015 at 1:54 PM, ganesh.ya...@sungard.com wrote:
  Siegfried and Michael Thank you for your replies and help.
  
  -Original Message-
  From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
  Sent: Thursday, January 15, 2015 3:45 AM
  To: solr-user@lucene.apache.org
  Subject: Re: OutOfMemoryError for PDF document upload into Solr
  
  Hi Ganesh,
  
  you can increase the heap size but parsing a 4 GB PDF document will very
  likely consume A LOT OF memory - I think you need to check if that large
  PDF can be parsed at all :-)
  
  Cheers,
  
  Siegfried Goeschl
  
  On 14.01.15 18:04, Michael Della Bitta wrote:
  Yep, you'll have to increase the heap size for your Tomcat container.
  
  http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial
  -heap-size-correctly
  
  Michael Della Bitta
  
  Senior Software Engineer
  
  o: +1 646 532 3062
  
  appinions inc.
  
  “The Science of Influence Marketing”
  
  18 East 41st Street
  
  New York, NY 10017
  
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
  https://plus.google.com/u/0/b/112002776285509593336/11200277628550959
  3336/posts
  w: appinions.com http://www.appinions.com/
  
  On Wed, Jan 14, 2015 at 12:00 PM, ganesh.ya...@sungard.com wrote:
  Hello,
  
  Can someone pass on the hints to get around following error? Is there
  any Heap Size parameter I can set in Tomcat or in Solr webApp that
  gets deployed in Solr?
  
  I am running Solr webapp inside Tomcat on my local machine which has
  RAM of 12 GB. I have PDF document which is 4 GB max in size that
  needs to be loaded into Solr
  
  
  
  
  Exception in thread http-apr-8983-exec-6 java.lang.: Java heap
  
  space
  
at java.util.AbstractCollection.toArray(Unknown Source)
at java.util.ArrayList.init(Unknown Source)
at
  
  org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
  
at
  
  org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575)
  
at
  
  org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254)
  
at
  
  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)
  
at
  
  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)
  
at
  
  org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)
  
at
  
  org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
  
at
  
  org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
  
at
  
  org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120
  )
  
at
  
  org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extracti
  ngDocumentLoader.java:219) 
at
  
  org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten
  tStreamHandlerBase.java:74) 
at
  
  org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
  se.java:135) 
at
  
  org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequ
  est(RequestHandlers.java:246) 
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
at
  
  org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav
  a:777) 
at
  
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
  va:418) 
at
  
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
  va:207) 
at
  
  org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
  ionFilterChain.java:241) 
at
  
  org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
  rChain.java:208) 
at
  
  org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
  .java:220) 
at
  
  

Re: OutOfMemoryError for PDF document upload into Solr

2015-01-16 Thread Jack Krupansky
It would be nice to have a SolrJ-level implementation as well as a
command-line implementation of the extraction request handler so that app
ingestion code could do the extraction outside of Solr at the app level and
even as a separate process to stream to the app or Solr. That would permit
the  to do customization, entity extraction, boiler-plate removal, etc. in
app-friendly code, before transport to the Solr server.

The extraction request handler is a really cool feature and quite
sufficient for a lot of scenarios, but additional architectural flexibility
would be a big win.

-- Jack Krupansky

On Fri, Jan 16, 2015 at 10:21 AM, Charlie Hull char...@flax.co.uk wrote:

 On 16/01/2015 04:02, Dan Davis wrote:

 Why re-write all the document conversion in Java ;)  Tika is very slow.
  5
 GB PDF is very big.


 Or you can run Tika in a separate process, or even on a separate machine,
 wrapped with something to cope if it dies due to some horrible input...we
 generally avoid document format translation within Solr and do it
 externally before feeding documents to Solr.

 Charlie


 If you have a lot of PDF like that try pdftotext in HTML and UTF-8 output
 mode.   The HTML mode captures some meta-data that would otherwise be
 lost.


 If you need to go faster still, you can  also write some stuff linked
 directly against poppler library.

 Before you jump down by through about Tika being slow - I wrote a PDF
 indexer that ran at 36 MB/s per core.   Different indexer, all C, lots of
 getjmp/longjmp.   But fast...



 On Thu, Jan 15, 2015 at 1:54 PM, ganesh.ya...@sungard.com wrote:

  Siegfried and Michael Thank you for your replies and help.

 -Original Message-
 From: Siegfried Goeschl [mailto:sgoes...@gmx.at]
 Sent: Thursday, January 15, 2015 3:45 AM
 To: solr-user@lucene.apache.org
 Subject: Re: OutOfMemoryError for PDF document upload into Solr

 Hi Ganesh,

 you can increase the heap size but parsing a 4 GB PDF document will very
 likely consume A LOT OF memory - I think you need to check if that large
 PDF can be parsed at all :-)

 Cheers,

 Siegfried Goeschl

 On 14.01.15 18:04, Michael Della Bitta wrote:

 Yep, you'll have to increase the heap size for your Tomcat container.

 http://stackoverflow.com/questions/6897476/tomcat-7-how-to-set-initial
 -heap-size-correctly

 Michael Della Bitta

 Senior Software Engineer

 o: +1 646 532 3062

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/11200277628550959
 3336/posts
 w: appinions.com http://www.appinions.com/

 On Wed, Jan 14, 2015 at 12:00 PM, ganesh.ya...@sungard.com wrote:

  Hello,

 Can someone pass on the hints to get around following error? Is there
 any Heap Size parameter I can set in Tomcat or in Solr webApp that
 gets deployed in Solr?

 I am running Solr webapp inside Tomcat on my local machine which has
 RAM of 12 GB. I have PDF document which is 4 GB max in size that
 needs to be loaded into Solr




 Exception in thread http-apr-8983-exec-6 java.lang.: Java heap

 space

   at java.util.AbstractCollection.toArray(Unknown Source)
   at java.util.ArrayList.init(Unknown Source)
   at
 org.apache.pdfbox.cos.COSDocument.getObjects(COSDocument.java:518)
   at

 org.apache.pdfbox.cos.COSDocument.close(COSDocument.java:575)

   at

 org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:254)

   at

 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)

   at

 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)

   at

 org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)

   at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   at
 org.apache.tika.parser.AutoDetectParser.parse(
 AutoDetectParser.java:120)
   at

  org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(
 ExtractingDocumentLoader.java:219)

   at

  org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
 ContentStreamHandlerBase.java:74)

   at

  org.apache.solr.handler.RequestHandlerBase.handleRequest(
 RequestHandlerBase.java:135)

   at

  org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
 handleRequest(RequestHandlers.java:246)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
   at

  org.apache.solr.servlet.SolrDispatchFilter.execute(
 SolrDispatchFilter.java:777)

   at

  org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:418)

   at

  org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:207)

   at

  org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
 

Re: Query ReRanking question

2015-01-16 Thread Ravi Solr
As per Erick's suggestion reposting my response to the group. Joel and
Erick Thank you very much for helping me out with the ReRanking question a
while ago.

I have an alternative which seems to be working better for me than
ReRanking, can you kindly let me know of any pitfalls that you guys can
think of about the this approach ?? Since we value relevancy  recency at
the same time even though both are mutually exclusive, i thought maybe I
can use the function queries to adjust the boost as follows

boost=max(recip(ms(NOW/HOUR,publish_date),7.889e-10,1,1),scale(query($q),0,1))

What I intended to do here is - if it matched a more recent doc it will
take recency into consideration, however if the relevancy is better than
date boost we keep relevancy. What do you guys think ??

Thanks,

Ravi Kiran Bhaskar


On Mon, Sep 8, 2014 at 12:35 PM, Ravi Solr ravis...@gmail.com wrote:

 Joel and Erick,
Thank you very much for explaining how the ReRanking works. Now
 its a bit more clear.

 Thanks,

 Ravi Kiran Bhaskar

 On Sun, Sep 7, 2014 at 4:45 PM, Joel Bernstein joels...@gmail.com wrote:

 Oops wrong usage pattern. It should be:

 1) Main query is sorted by a field (scores tracked silently in the
 background).
 2) Reranker is reRanking docs based on the score from the main query.



 Joel Bernstein
 Search Engineer at Heliosearch


 On Sun, Sep 7, 2014 at 4:43 PM, Joel Bernstein joels...@gmail.com
 wrote:

  Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the
  scores from the main query. So this explains things. Speaking of
 explaining
  things, the ReRankingParserPlugin also works with Lucene's explain. So
 if
  you use debugQuery=true we should see that the score from the initial
 query
  was combined with the score from the reRankQuery, which should be 1.
 
  You have stumbled on a interesting usage pattern which I never
 considered.
  But basically what's happening is:
 
  1) Main query is sorted by score.
  2) Reranker is reRanking docs based on the score from the main query.
 
  No, worries Erick, you've taught me a lot over the past couple of years!
 
 
 
 
 
 
 
 
  Joel Bernstein
  Search Engineer at Heliosearch
 
 
  On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Joel:
 
  I find that whenever I say something totally wrong publicly, I
  remember the correction really really well...
 
  Thanks for straightening that out!
  Erick
 
  On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein joels...@gmail.com
  wrote:
   This folllowing query:
  
   http://localhost:8080/solr/select?q=malaysian airline
 crashrq={!rerank
   reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date
   descfl=headline,publish_date,score
  
   Is doing the following:
  
   The main query is sorted by publish_date. Then the results are
 reranked
  by
   *:*, which in theory would have no effect at all.
  
   The reRankQuery only uses the reRankQuery to re-rank the results. The
  sort
   param will always apply to the main query.
  
  
  
  
  
  
  
  
  
  
  
  
   Joel Bernstein
   Search Engineer at Heliosearch
  
  
   On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com
 wrote:
  
   Erick,
   Your idea about reversing Joel's suggestion seems to give
 the
  best
   results of all the options I tried...but I cant seem to understand
  why. I
   thought the query shown below should give irrelevant results as
  sorting by
   date would throw relevancy off...but somehow its getting relevant
  results
   with fair enough reverse chronology. It is as if the sort is applied
  after
   the docs are collected and reranked (which is what I wanted). One
 more
   thing that baffled me was, if I change reRankDocs from 1000 to100
 the
   results become irrelevant, which doesnt make sense.
  
   So can you kindly explain whats going on in the following query.
  
   http://localhost:8080/solr/select?q=malaysian airline
  crashrq={!rerank
   reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date
   descfl=headline,publish_date,score
  
   I love the solr community, so much to learn from so many
 knowledgeable
   people.
  
   Thanks
  
   Ravi Kiran Bhaskar
  
  
  
   On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
OK, why can't you switch the clauses from Joel's suggestion?
   
Something like:
q=Malaysia plane crashrq={!rerank reRankDocs=1000
reRankQuery=$myquery}myquery=*:*sort=date+desc
   
(haven't tried this yet, but you get the idea).
   
Best,
Erick
   
On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
 Hi - You can already achieve this by boosting on the document's
   recency.
The result set won't be exactly ordered by date but you will get
 the
  most
relevant and recent documents on top.

 Markus

 -Original message-
 From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com
 
 Sent: Friday 5th 

Re: Apache Solr quickstart tutorial - error while loading main class SimplePostTool

2015-01-16 Thread Shubhanshu Gupta
Thanks a lot. It did work. A last favor - can you please explain me, why
did the old command didn't work and why this one worked? Although, I do
know that the command you have given assumes that I did not set the
environment through: export CLASSPATH

=dist/solr-core-4.10.2.jar .  But I already set the environment,
still there was no effect. Please correct me if I am wrong anywhere.


Thanks.


LinkedIn
https://www.linkedin.com/profile/view?id=310143808snapshotID=0authType=nameauthToken=cDRNtrk=NUS-body-member-namesl=NPU_REG%3Bno_results%3B-1%3Bactivity%3A5903287270026268672%3B
|
Twitter https://twitter.com/Shubhanshugupta

On Fri, Jan 16, 2015 at 7:26 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Shubhanshu,

 How about this one?

 java -classpath dist/solr-core-*jar -Dauto -Drecursive
 org.apache.solr.util.SimplePostTool docs/

 Ahmet


 On Friday, January 16, 2015 3:13 PM, Shubhanshu Gupta 
 shubhanshu.gupt...@gmail.com wrote:
 I am following Apache Solr quickstart tutorial
 http://lucene.apache.org/solr/quickstart.html. The tutorial comes across
 indexing a directory of rich files which requires implementing java -Dauto
 -Drecursive org.apache.solr.util.SimplePostTool docs/ .

 I am getting an error which says: Could not find or load main class
 org.apache.solr.util.SimplePostTool inspite of following the quickstart
 tutorial closely. I am not getting how to resolve the error and proceed
 ahead with the tutorial.
 I would whole-heartedly appreciate any help. Thanks in advance.


 Regards,
 Shubhanshu Gupta

 LinkedIn
 
 https://www.linkedin.com/profile/view?id=310143808snapshotID=0authType=nameauthToken=cDRNtrk=NUS-body-member-namesl=NPU_REG%3Bno_results%3B-1%3Bactivity%3A5903287270026268672%3B
 
 |
 Twitter https://twitter.com/Shubhanshugupta



Solr example for Solr 4.10.2 gives warning about Multiple request handlers with same name

2015-01-16 Thread Tom Burton-West
Hello,

I'm running Solr 4.10.2 out of the box with the Solr example.

i.e. ant example
cd solr/example
java -jar start.jar

in /example/log

At start-up the example gives this message in the log:

WARN  - 2015-01-16 12:31:40.895; org.apache.solr.core.RequestHandlers;
Multiple requestHandler registered to the same name: /update ignoring:
org.apache.solr.handler.UpdateRequestHandler

Is this a bug?   Is there something wrong with the out of the box example
configuration?

Tom


Solr numFound 0 but doc list empty in Solr Cloud setup

2015-01-16 Thread Jaikit Savla
I am using below tutorial for Solr Cloud setup with 2 shards
http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster


I am able to get the default set up working. However, I have a requirement 
where my index is not in default location (data/index) and hence when I start 
jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when I 
query I get results with numFound  0 but doc list is always empty.

I verified that my index does have fields stored and indexed. Anyone else faced 
similar issue or have an idea on what I am missing ? Verified that by loading 
single core.

Appreciate any help.

request:

http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1


response:
{ responseHeader: { status: 0, QTime: 18, params: { shards: 
http://localhost:/solr/collection1;, indent: true, q: body:\to\, 
_: 1421390858638, wt: json } }, response: { numFound: 2564, 
start: 0, maxScore: 0.4523638, docs: [] } }


Re: Query ReRanking question

2015-01-16 Thread Erick Erickson
Ravi:

Yep, this is the standard way to have recency influence the rank rather than
take over absolute ordering via a sort=date_time or similar.

Of course how strongly the rank is influenced is more an art than a science
as far as figuring out what actual constants to put in

Best,
Erick

On Fri, Jan 16, 2015 at 8:03 AM, Ravi Solr ravis...@gmail.com wrote:
 As per Erick's suggestion reposting my response to the group. Joel and
 Erick Thank you very much for helping me out with the ReRanking question a
 while ago.

 I have an alternative which seems to be working better for me than
 ReRanking, can you kindly let me know of any pitfalls that you guys can
 think of about the this approach ?? Since we value relevancy  recency at
 the same time even though both are mutually exclusive, i thought maybe I
 can use the function queries to adjust the boost as follows

 boost=max(recip(ms(NOW/HOUR,publish_date),7.889e-10,1,1),scale(query($q),0,1))

 What I intended to do here is - if it matched a more recent doc it will
 take recency into consideration, however if the relevancy is better than
 date boost we keep relevancy. What do you guys think ??

 Thanks,

 Ravi Kiran Bhaskar


 On Mon, Sep 8, 2014 at 12:35 PM, Ravi Solr ravis...@gmail.com wrote:

 Joel and Erick,
Thank you very much for explaining how the ReRanking works. Now
 its a bit more clear.

 Thanks,

 Ravi Kiran Bhaskar

 On Sun, Sep 7, 2014 at 4:45 PM, Joel Bernstein joels...@gmail.com wrote:

 Oops wrong usage pattern. It should be:

 1) Main query is sorted by a field (scores tracked silently in the
 background).
 2) Reranker is reRanking docs based on the score from the main query.



 Joel Bernstein
 Search Engineer at Heliosearch


 On Sun, Sep 7, 2014 at 4:43 PM, Joel Bernstein joels...@gmail.com
 wrote:

  Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the
  scores from the main query. So this explains things. Speaking of
 explaining
  things, the ReRankingParserPlugin also works with Lucene's explain. So
 if
  you use debugQuery=true we should see that the score from the initial
 query
  was combined with the score from the reRankQuery, which should be 1.
 
  You have stumbled on a interesting usage pattern which I never
 considered.
  But basically what's happening is:
 
  1) Main query is sorted by score.
  2) Reranker is reRanking docs based on the score from the main query.
 
  No, worries Erick, you've taught me a lot over the past couple of years!
 
 
 
 
 
 
 
 
  Joel Bernstein
  Search Engineer at Heliosearch
 
 
  On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Joel:
 
  I find that whenever I say something totally wrong publicly, I
  remember the correction really really well...
 
  Thanks for straightening that out!
  Erick
 
  On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein joels...@gmail.com
  wrote:
   This folllowing query:
  
   http://localhost:8080/solr/select?q=malaysian airline
 crashrq={!rerank
   reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date
   descfl=headline,publish_date,score
  
   Is doing the following:
  
   The main query is sorted by publish_date. Then the results are
 reranked
  by
   *:*, which in theory would have no effect at all.
  
   The reRankQuery only uses the reRankQuery to re-rank the results. The
  sort
   param will always apply to the main query.
  
  
  
  
  
  
  
  
  
  
  
  
   Joel Bernstein
   Search Engineer at Heliosearch
  
  
   On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com
 wrote:
  
   Erick,
   Your idea about reversing Joel's suggestion seems to give
 the
  best
   results of all the options I tried...but I cant seem to understand
  why. I
   thought the query shown below should give irrelevant results as
  sorting by
   date would throw relevancy off...but somehow its getting relevant
  results
   with fair enough reverse chronology. It is as if the sort is applied
  after
   the docs are collected and reranked (which is what I wanted). One
 more
   thing that baffled me was, if I change reRankDocs from 1000 to100
 the
   results become irrelevant, which doesnt make sense.
  
   So can you kindly explain whats going on in the following query.
  
   http://localhost:8080/solr/select?q=malaysian airline
  crashrq={!rerank
   reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date
   descfl=headline,publish_date,score
  
   I love the solr community, so much to learn from so many
 knowledgeable
   people.
  
   Thanks
  
   Ravi Kiran Bhaskar
  
  
  
   On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
OK, why can't you switch the clauses from Joel's suggestion?
   
Something like:
q=Malaysia plane crashrq={!rerank reRankDocs=1000
reRankQuery=$myquery}myquery=*:*sort=date+desc
   
(haven't tried this yet, but you get the idea).
   
Best,
Erick
   
On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma

Re: Solr numFound 0 but doc list empty in Solr Cloud setup

2015-01-16 Thread Erick Erickson
Any chance that you've defined rows=0 in your handler? Or is it possible
that you have not set stored=true for any of your fields?

Best,
Erick

On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla
jaikit.sa...@yahoo.com.invalid wrote:
 I am using below tutorial for Solr Cloud setup with 2 shards
 http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster


 I am able to get the default set up working. However, I have a requirement 
 where my index is not in default location (data/index) and hence when I start 
 jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when 
 I query I get results with numFound  0 but doc list is always empty.

 I verified that my index does have fields stored and indexed. Anyone else 
 faced similar issue or have an idea on what I am missing ? Verified that by 
 loading single core.

 Appreciate any help.

 request:

 http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1


 response:
 { responseHeader: { status: 0, QTime: 18, params: { shards: 
 http://localhost:/solr/collection1;, indent: true, q: 
 body:\to\, _: 1421390858638, wt: json } }, response: { 
 numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }


Re: Solr numFound 0 but doc list empty in Solr Cloud setup

2015-01-16 Thread Jaikit Savla
Verified that all my fields are stored and marked as indexed.
field name=bodytype=string indexed=true  stored=true 
multiValued=true /



-- 
http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true

{
responseHeader: {
status: 0,
QTime: 19,
params: {
shards: http://localhost:/solr/collection1;,
indent: true,
start: 1,
q: body:from,
shards.info: true,
wt: json,
rows: 10
}
},
shards.info: {
http://localhost:/solr/collection1: {
numFound: 1717,
maxScore: 0.5327856,
shardAddress: http://localhost:/solr/collection1;,
time: 12
}
},
response: {
numFound: 1707,
start: 1,
maxScore: 0.5327856,
docs: [ ]
}
}



On Friday, January 16, 2015 9:56 AM, Erick Erickson erickerick...@gmail.com 
wrote:
Any chance that you've defined rows=0 in your handler? Or is it possible
that you have not set stored=true for any of your fields?

Best,
Erick


On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla
jaikit.sa...@yahoo.com.invalid wrote:
 I am using below tutorial for Solr Cloud setup with 2 shards
 http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster


 I am able to get the default set up working. However, I have a requirement 
 where my index is not in default location (data/index) and hence when I start 
 jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when 
 I query I get results with numFound  0 but doc list is always empty.

 I verified that my index does have fields stored and indexed. Anyone else 
 faced similar issue or have an idea on what I am missing ? Verified that by 
 loading single core.

 Appreciate any help.

 request:

 http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1


 response:
 { responseHeader: { status: 0, QTime: 18, params: { shards: 
 http://localhost:/solr/collection1;, indent: true, q: 
 body:\to\, _: 1421390858638, wt: json } }, response: { 
 numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }


Re: Solr numFound 0 but doc list empty in Solr Cloud setup

2015-01-16 Thread Jaikit Savla
As I said earlier - single core set up works fine with same solrconfig.xml and 
schema.xml

cd example
java -Djetty.port= -Dsolr.data.dir=/index/path -jar start.jar 

I am running Solr-4.10. Do I need to change any other configuration for running 
in solr cloud mode ?



On Friday, January 16, 2015 11:56 AM, Jaikit Savla 
jaikit.sa...@yahoo.com.INVALID wrote:
Verified that all my fields are stored and marked as indexed.
field name=bodytype=string indexed=true  stored=true 
multiValued=true /



-- 
http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true

{
responseHeader: {
status: 0,
QTime: 19,
params: {
shards: http://localhost:/solr/collection1;,
indent: true,
start: 1,
q: body:from,
shards.info: true,
wt: json,
rows: 10
}
},
shards.info: {
http://localhost:/solr/collection1: {
numFound: 1717,
maxScore: 0.5327856,
shardAddress: http://localhost:/solr/collection1;,
time: 12
}
},
response: {
numFound: 1707,
start: 1,
maxScore: 0.5327856,
docs: [ ]
}
}




On Friday, January 16, 2015 9:56 AM, Erick Erickson erickerick...@gmail.com 
wrote:
Any chance that you've defined rows=0 in your handler? Or is it possible
that you have not set stored=true for any of your fields?

Best,
Erick


On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla
jaikit.sa...@yahoo.com.invalid wrote:
 I am using below tutorial for Solr Cloud setup with 2 shards
 http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster


 I am able to get the default set up working. However, I have a requirement 
 where my index is not in default location (data/index) and hence when I start 
 jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when 
 I query I get results with numFound  0 but doc list is always empty.

 I verified that my index does have fields stored and indexed. Anyone else 
 faced similar issue or have an idea on what I am missing ? Verified that by 
 loading single core.

 Appreciate any help.

 request:

 http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1


 response:
 { responseHeader: { status: 0, QTime: 18, params: { shards: 
 http://localhost:/solr/collection1;, indent: true, q: 
 body:\to\, _: 1421390858638, wt: json } }, response: { 
 numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }


Re: Solr numFound 0 but doc list empty in Solr Cloud setup

2015-01-16 Thread Jaikit Savla
One more point: 
In cloud mode: If I submit a request with fl=id, it returns doc list. But when 
I add any other field, I get an empty doc list. 

http://localhost:/solr/select?q=domain:ebaywt=jsonshards=http://localhost:/solr/fl=idrows=1

{
responseHeader: {
status: 0,
QTime: 7,
params: {
fl: id,
shards: http://localhost:/solr/;,
q: domain:ebay,
wt: json,
rows: 1
}
},
response: {
numFound: 17,
start: 0,
maxScore: 3.8559604,
docs: [
{
id: d8406557-6cd8-46d9-9a5e-29844387afc4
}
]
}
}


Note: all of above works in single core mode.



On Friday, January 16, 2015 12:13 PM, Jaikit Savla 
jaikit.sa...@yahoo.com.INVALID wrote:
As I said earlier - single core set up works fine with same solrconfig.xml and 
schema.xml

cd example
java -Djetty.port= -Dsolr.data.dir=/index/path -jar start.jar 

I am running Solr-4.10. Do I need to change any other configuration for running 
in solr cloud mode ?




On Friday, January 16, 2015 11:56 AM, Jaikit Savla 
jaikit.sa...@yahoo.com.INVALID wrote:
Verified that all my fields are stored and marked as indexed.
field name=bodytype=string indexed=true  stored=true 
multiValued=true /



-- 
http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true

{
responseHeader: {
status: 0,
QTime: 19,
params: {
shards: http://localhost:/solr/collection1;,
indent: true,
start: 1,
q: body:from,
shards.info: true,
wt: json,
rows: 10
}
},
shards.info: {
http://localhost:/solr/collection1: {
numFound: 1717,
maxScore: 0.5327856,
shardAddress: http://localhost:/solr/collection1;,
time: 12
}
},
response: {
numFound: 1707,
start: 1,
maxScore: 0.5327856,
docs: [ ]
}
}




On Friday, January 16, 2015 9:56 AM, Erick Erickson erickerick...@gmail.com 
wrote:
Any chance that you've defined rows=0 in your handler? Or is it possible
that you have not set stored=true for any of your fields?

Best,
Erick


On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla
jaikit.sa...@yahoo.com.invalid wrote:
 I am using below tutorial for Solr Cloud setup with 2 shards
 http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster


 I am able to get the default set up working. However, I have a requirement 
 where my index is not in default location (data/index) and hence when I start 
 jvm for each shard I run with -Dsolr.data.dir=custom index path . Now when 
 I query I get results with numFound  0 but doc list is always empty.

 I verified that my index does have fields stored and indexed. Anyone else 
 faced similar issue or have an idea on what I am missing ? Verified that by 
 loading single core.

 Appreciate any help.

 request:

 http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1


 response:
 { responseHeader: { status: 0, QTime: 18, params: { shards: 
 http://localhost:/solr/collection1;, indent: true, q: 
 body:\to\, _: 1421390858638, wt: json } }, response: { 
 numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }


Re: Solr example for Solr 4.10.2 gives warning about Multiple request handlers with same name

2015-01-16 Thread Michael Sokolov
I've seen the same thing, poked around a bit and eventually decided to 
ignore it.  I think there may be a ticket related to that saying it's a 
logging bug (ie not a real issue), but I couldn't swear to it.


-Mike

On 01/16/2015 12:36 PM, Tom Burton-West wrote:

Hello,

I'm running Solr 4.10.2 out of the box with the Solr example.

i.e. ant example
cd solr/example
java -jar start.jar

in /example/log

At start-up the example gives this message in the log:

WARN  - 2015-01-16 12:31:40.895; org.apache.solr.core.RequestHandlers;
Multiple requestHandler registered to the same name: /update ignoring:
org.apache.solr.handler.UpdateRequestHandler

Is this a bug?   Is there something wrong with the out of the box example
configuration?

Tom





Re: Solr numFound 0 but doc list empty in Solr Cloud setup

2015-01-16 Thread Anshum Gupta
Looks like a config issue to me more than anything else.
Can you share your solrconfig? You will not be able to attach a file here
but you could share it via pastebin or something similar.
Also, why are you adding the shards=http://localhost:8983/solr/collection1;
part to your request? You don't need to do that in most cases.

On Fri, Jan 16, 2015 at 12:20 PM, Jaikit Savla 
jaikit.sa...@yahoo.com.invalid wrote:

 One more point:
 In cloud mode: If I submit a request with fl=id, it returns doc list. But
 when I add any other field, I get an empty doc list.


 http://localhost:/solr/select?q=domain:ebaywt=jsonshards=http://localhost:/solr/fl=idrows=1

 {
 responseHeader: {
 status: 0,
 QTime: 7,
 params: {
 fl: id,
 shards: http://localhost:/solr/;,
 q: domain:ebay,
 wt: json,
 rows: 1
 }
 },
 response: {
 numFound: 17,
 start: 0,
 maxScore: 3.8559604,
 docs: [
 {
 id: d8406557-6cd8-46d9-9a5e-29844387afc4
 }
 ]
 }
 }


 Note: all of above works in single core mode.



 On Friday, January 16, 2015 12:13 PM, Jaikit Savla
 jaikit.sa...@yahoo.com.INVALID wrote:
 As I said earlier - single core set up works fine with same solrconfig.xml
 and schema.xml

 cd example
 java -Djetty.port= -Dsolr.data.dir=/index/path -jar start.jar

 I am running Solr-4.10. Do I need to change any other configuration for
 running in solr cloud mode ?




 On Friday, January 16, 2015 11:56 AM, Jaikit Savla
 jaikit.sa...@yahoo.com.INVALID wrote:
 Verified that all my fields are stored and marked as indexed.
 field name=bodytype=string indexed=true  stored=true
 multiValued=true /



 --

 http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true

 {
 responseHeader: {
 status: 0,
 QTime: 19,
 params: {
 shards: http://localhost:/solr/collection1;,
 indent: true,
 start: 1,
 q: body:from,
 shards.info: true,
 wt: json,
 rows: 10
 }
 },
 shards.info: {
 http://localhost:/solr/collection1: {
 numFound: 1717,
 maxScore: 0.5327856,
 shardAddress: http://localhost:/solr/collection1;,
 time: 12
 }
 },
 response: {
 numFound: 1707,
 start: 1,
 maxScore: 0.5327856,
 docs: [ ]
 }
 }




 On Friday, January 16, 2015 9:56 AM, Erick Erickson 
 erickerick...@gmail.com wrote:
 Any chance that you've defined rows=0 in your handler? Or is it possible
 that you have not set stored=true for any of your fields?

 Best,
 Erick


 On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla
 jaikit.sa...@yahoo.com.invalid wrote:
  I am using below tutorial for Solr Cloud setup with 2 shards
 
 http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster
 
 
  I am able to get the default set up working. However, I have a
 requirement where my index is not in default location (data/index) and
 hence when I start jvm for each shard I run with -Dsolr.data.dir=custom
 index path . Now when I query I get results with numFound  0 but doc list
 is always empty.
 
  I verified that my index does have fields stored and indexed. Anyone
 else faced similar issue or have an idea on what I am missing ? Verified
 that by loading single core.
 
  Appreciate any help.
 
  request:
 
 
 http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1
 
 
  response:
  { responseHeader: { status: 0, QTime: 18, params: { shards: 
 http://localhost:/solr/collection1;, indent: true, q:
 body:\to\, _: 1421390858638, wt: json } }, response: {
 numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }




-- 
Anshum Gupta
http://about.me/anshumgupta


Re: Solr numFound 0 but doc list empty in Solr Cloud setup

2015-01-16 Thread Jaikit Savla
I followed all the steps listed here: 
http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster

I have not updated solrconfig.xml and it is same as what comes default with 
4.10.

The only thing I added extra was list of my fields in 
example/solr/collection1/conf/schema.xml

@shards: If I query with out that param, it returns below error:
http://localhost:/solr/collection1/select?q=*:*
response
lst name=responseHeader
int name=status503/int
int name=QTime3/int
lst name=params
str name=q*:*/str
/lst
/lst
lst name=error
str name=msgno servers hosting shard:/str
int name=code503/int
/lst
/response


 

 




On Friday, January 16, 2015 12:37 PM, Anshum Gupta ans...@anshumgupta.net 
wrote:
Looks like a config issue to me more than anything else.
Can you share your solrconfig? You will not be able to attach a file here
but you could share it via pastebin or something similar.
Also, why are you adding the shards=http://localhost:8983/solr/collection1;
part to your request? You don't need to do that in most cases.


On Fri, Jan 16, 2015 at 12:20 PM, Jaikit Savla 
jaikit.sa...@yahoo.com.invalid wrote:

 One more point:
 In cloud mode: If I submit a request with fl=id, it returns doc list. But
 when I add any other field, I get an empty doc list.


 http://localhost:/solr/select?q=domain:ebaywt=jsonshards=http://localhost:/solr/fl=idrows=1

 {
 responseHeader: {
 status: 0,
 QTime: 7,
 params: {
 fl: id,
 shards: http://localhost:/solr/;,
 q: domain:ebay,
 wt: json,
 rows: 1
 }
 },
 response: {
 numFound: 17,
 start: 0,
 maxScore: 3.8559604,
 docs: [
 {
 id: d8406557-6cd8-46d9-9a5e-29844387afc4
 }
 ]
 }
 }


 Note: all of above works in single core mode.



 On Friday, January 16, 2015 12:13 PM, Jaikit Savla
 jaikit.sa...@yahoo.com.INVALID wrote:
 As I said earlier - single core set up works fine with same solrconfig.xml
 and schema.xml

 cd example
 java -Djetty.port= -Dsolr.data.dir=/index/path -jar start.jar

 I am running Solr-4.10. Do I need to change any other configuration for
 running in solr cloud mode ?




 On Friday, January 16, 2015 11:56 AM, Jaikit Savla
 jaikit.sa...@yahoo.com.INVALID wrote:
 Verified that all my fields are stored and marked as indexed.
 field name=bodytype=string indexed=true  stored=true
 multiValued=true /



 --

 http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true

 {
 responseHeader: {
 status: 0,
 QTime: 19,
 params: {
 shards: http://localhost:/solr/collection1;,
 indent: true,
 start: 1,
 q: body:from,
 shards.info: true,
 wt: json,
 rows: 10
 }
 },
 shards.info: {
 http://localhost:/solr/collection1: {
 numFound: 1717,
 maxScore: 0.5327856,
 shardAddress: http://localhost:/solr/collection1;,
 time: 12
 }
 },
 response: {
 numFound: 1707,
 start: 1,
 maxScore: 0.5327856,
 docs: [ ]
 }
 }




 On Friday, January 16, 2015 9:56 AM, Erick Erickson 
 erickerick...@gmail.com wrote:
 Any chance that you've defined rows=0 in your handler? Or is it possible
 that you have not set stored=true for any of your fields?

 Best,
 Erick


 On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla
 jaikit.sa...@yahoo.com.invalid wrote:
  I am using below tutorial for Solr Cloud setup with 2 shards
 
 http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster
 
 
  I am able to get the default set up working. However, I have a
 requirement where my index is not in default location (data/index) and
 hence when I start jvm for each shard I run with -Dsolr.data.dir=custom
 index path . Now when I query I get results with numFound  0 but doc list
 is always empty.
 
  I verified that my index does have fields stored and indexed. Anyone
 else faced similar issue or have an idea on what I am missing ? Verified
 that by loading single core.
 
  Appreciate any help.
 
  request:
 
 
 http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1
 
 
  response:
  { responseHeader: { status: 0, QTime: 18, params: { shards: 
 http://localhost:/solr/collection1;, indent: true, q:
 body:\to\, _: 1421390858638, wt: json } }, response: {
 numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }




-- 
Anshum Gupta
http://about.me/anshumgupta


Re: Solr numFound 0 but doc list empty in Solr Cloud setup

2015-01-16 Thread Jaikit Savla
Anshuman,

You are right about @shards param not required. One of my shard was down and 
hence when I added 
shards.tolerant=true, it worked without shards param. However document list is 
still empty.


content of solrconfig.xml
http://pastebin.com/CJxD22t1

 




On Friday, January 16, 2015 1:24 PM, Jaikit Savla jaikit.sa...@yahoo.com 
wrote:
I followed all the steps listed here: 
http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster

I have not updated solrconfig.xml and it is same as what comes default with 
4.10.

The only thing I added extra was list of my fields in 
example/solr/collection1/conf/schema.xml

@shards: If I query with out that param, it returns below error:
http://localhost:/solr/collection1/select?q=*:*
response
lst name=responseHeader
int name=status503/int
int name=QTime3/int
lst name=params
str name=q*:*/str
/lst
/lst
lst name=error
str name=msgno servers hosting shard:/str
int name=code503/int
/lst
/response










On Friday, January 16, 2015 12:37 PM, Anshum Gupta ans...@anshumgupta.net 
wrote:
Looks like a config issue to me more than anything else.
Can you share your solrconfig? You will not be able to attach a file here
but you could share it via pastebin or something similar.
Also, why are you adding the shards=http://localhost:8983/solr/collection1;
part to your request? You don't need to do that in most cases.


On Fri, Jan 16, 2015 at 12:20 PM, Jaikit Savla 
jaikit.sa...@yahoo.com.invalid wrote:

 One more point:
 In cloud mode: If I submit a request with fl=id, it returns doc list. But
 when I add any other field, I get an empty doc list.


 http://localhost:/solr/select?q=domain:ebaywt=jsonshards=http://localhost:/solr/fl=idrows=1

 {
 responseHeader: {
 status: 0,
 QTime: 7,
 params: {
 fl: id,
 shards: http://localhost:/solr/;,
 q: domain:ebay,
 wt: json,
 rows: 1
 }
 },
 response: {
 numFound: 17,
 start: 0,
 maxScore: 3.8559604,
 docs: [
 {
 id: d8406557-6cd8-46d9-9a5e-29844387afc4
 }
 ]
 }
 }


 Note: all of above works in single core mode.



 On Friday, January 16, 2015 12:13 PM, Jaikit Savla
 jaikit.sa...@yahoo.com.INVALID wrote:
 As I said earlier - single core set up works fine with same solrconfig.xml
 and schema.xml

 cd example
 java -Djetty.port= -Dsolr.data.dir=/index/path -jar start.jar

 I am running Solr-4.10. Do I need to change any other configuration for
 running in solr cloud mode ?




 On Friday, January 16, 2015 11:56 AM, Jaikit Savla
 jaikit.sa...@yahoo.com.INVALID wrote:
 Verified that all my fields are stored and marked as indexed.
 field name=bodytype=string indexed=true  stored=true
 multiValued=true /



 --

 http://localhost:/solr/collection1/query?q=body%3A%22from%22wt=jsonindent=trueshards=http://localhost:/solr/collection1start=1rows=10shards.info=true

 {
 responseHeader: {
 status: 0,
 QTime: 19,
 params: {
 shards: http://localhost:/solr/collection1;,
 indent: true,
 start: 1,
 q: body:from,
 shards.info: true,
 wt: json,
 rows: 10
 }
 },
 shards.info: {
 http://localhost:/solr/collection1: {
 numFound: 1717,
 maxScore: 0.5327856,
 shardAddress: http://localhost:/solr/collection1;,
 time: 12
 }
 },
 response: {
 numFound: 1707,
 start: 1,
 maxScore: 0.5327856,
 docs: [ ]
 }
 }




 On Friday, January 16, 2015 9:56 AM, Erick Erickson 
 erickerick...@gmail.com wrote:
 Any chance that you've defined rows=0 in your handler? Or is it possible
 that you have not set stored=true for any of your fields?

 Best,
 Erick


 On Fri, Jan 16, 2015 at 9:46 AM, Jaikit Savla
 jaikit.sa...@yahoo.com.invalid wrote:
  I am using below tutorial for Solr Cloud setup with 2 shards
 
 http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster
 
 
  I am able to get the default set up working. However, I have a
 requirement where my index is not in default location (data/index) and
 hence when I start jvm for each shard I run with -Dsolr.data.dir=custom
 index path . Now when I query I get results with numFound  0 but doc list
 is always empty.
 
  I verified that my index does have fields stored and indexed. Anyone
 else faced similar issue or have an idea on what I am missing ? Verified
 that by loading single core.
 
  Appreciate any help.
 
  request:
 
 
 http://localhost:/solr/collection1/select?q=body%3A%22to%22wt=jsonindent=trueshards=http://localhost:/solr/collection1
 
 
  response:
  { responseHeader: { status: 0, QTime: 18, params: { shards: 
 http://localhost:/solr/collection1;, indent: true, q:
 body:\to\, _: 1421390858638, wt: json } }, response: {
 numFound: 2564, start: 0, maxScore: 0.4523638, docs: [] } }




-- 
Anshum Gupta
http://about.me/anshumgupta


Solr Cloud Stress Test

2015-01-16 Thread david mitche
Hi,

   I am a student, planning to learn and do a features and functionality
test of solr-cloud as one of my project. I liked to do the stress and
performance test of solr-cloud on my local machine.  (machine of 16gb ram,
250 gb ssd and 2.2 GHz Intel Core i7).  Multiple features of cloud. What is
the recommended way to get started with it?

Thanks.
David


Need Debug Direction on Performance Problem

2015-01-16 Thread Naresh Yadav
Hi all,

We have single solr index with 3 fixed fields(on of field is tokenized with
space) and rest dynamic fields(string fields in range of 10-20).

Current size of index is 2 GB with around 12 lakh docs and solr
nodes are of 4 core, 16 gb ram linux machines.

Writes performance is good then we tested one read query(In select query we
are applying filter criteria on tokenized field  reading only score field,
no grouping/faceting) in two setups :

*Setup1 : *Single Node Cloud with shards=1, replication=1
In this setup whole 12 lakh docs are on same machine. Our filter query
reading
around 10 lakh docs with only score field is taking *1 minutes*.

*Setup2 : *Two Node Cloud with shards=2, replication=1
In this setup whole 6 lakh docs on node1 and 6 lakh on node2. Our same
filter query reading around 10 lakh docs with only score field is taking *114
minutes.*

Please guide us what can be possible reasons of degradation of
performance after sharding of index. How can we check where solr server
is taking time to return results.

Thanks
Naresh