SQL JOIN eta

2017-03-14 Thread Damien Kamerman
Hi all, does anyone know roughly when the SQL JOIN functionally will be
released? Is there a Jira for this? I'm guessing this might be on Solr 6.6.

Cheers,
Damien.


Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Zheng Lin Edwin Yeo
Ok, thanks for the heads up.

I'll review on the solrconfig.xml first.

Regards,
Edwin


On 15 March 2017 at 00:23, Joel Bernstein  wrote:

> Yeah, there has been a lot of changes to configs in Solr 6. All the
> streaming request handlers have now been made implicit so the
> solrconfig.xml doesn't include them. Something seems to be stepping on the
> implicit configs.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 14, 2017 at 12:20 PM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> wrote:
>
> > Could it be because the solrconfig.xml was created in Solr 5.x, and was
> > upgraded to Solr 6.x, and there is something which I have missed out
> during
> > the upgrading?
> >
> > So far for this server, only the schema.xml and solrconfig.xml was
> carried
> > forward and modified from Solr 5.x. The files for Solr 6.4.1 were
> > downloaded directly from the Solr website, and the index were indexed
> > directly in Solr 6.4.1.
> >
> > Regards,
> > Edwin
> >
> >
> > On 14 March 2017 at 23:53, Joel Bernstein  wrote:
> >
> > > Yeah, something is wrong with the configuration, because /export only
> > > should be returning json. Have you changed the configurations?
> > >
> > > What were the exact steps you used in setting up the server?
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Tue, Mar 14, 2017 at 11:50 AM, Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com>
> > > wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > This is what get from query:
> > > >
> > > > 
> > > > 
> > > > true
> > > > 0
> > > > 0
> > > > 
> > > > 
> > > > 
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 14 March 2017 at 22:33, Joel Bernstein 
> wrote:
> > > >
> > > > > try running the following query:
> > > > >
> > > > > http://localhost:8983/solr/email/export?{!terms+f%3Dfrom}
> > > ed...@mail.com
> > > > > =false=from,to=to+asc,from+asc=json=2.2
> > > > >
> > > > > Let's see what comes back from this.
> > > > >
> > > > > Joel Bernstein
> > > > > http://joelsolr.blogspot.com/
> > > > >
> > > > > On Tue, Mar 14, 2017 at 10:20 AM, Zheng Lin Edwin Yeo <
> > > > > edwinye...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Joel,
> > > > > >
> > > > > > I have only managed to find these above the stack trace.
> > > > > >
> > > > > > 2017-03-14 14:08:42.819 INFO  (qtp1543727556-2635) [   ]
> > > > > > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> > > > > > params={wt=json&_=1489500479108=0} status=0 QTime=0
> > > > > > 2017-03-14 14:08:43.085 INFO  (qtp1543727556-2397) [c:email
> > s:shard1
> > > > > > r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr
> > > > > path=/stream
> > > > > > params={indent=true=gatherNodes(email,+walk%3D"edwin@mail-
> > > > > > >from",+gather%3D"to")}
> > > > > > status=0 QTime=0
> > > > > > 2017-03-14 14:08:43.116 INFO  (qtp1543727556-8207) [c:email
> > s:shard1
> > > > > > r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr
> > > > > path=/export
> > > > > > params={q={!terms+f%3Dfrom}ed...@mail.com=false&
> > > > > > fl=from,to=to+asc,from+asc=json=2.2}
> > > > > > hits=2471 status=0 QTime=19
> > > > > > 2017-03-14 14:08:43.163 ERROR (qtp1543727556-2397) [c:email
> > s:shard1
> > > > > > r:core_node1 x:email] o.a.s.c.s.i.s.ExceptionStream
> > > > > > java.lang.RuntimeException: java.util.concurrent.
> > ExecutionException:
> > > > > > java.lang.RuntimeException: java.io.IOException:
> > > > > > java.util.concurrent.ExecutionException: java.io.IOException:
> -->
> > > > > > http://localhost:8983/solr/email/: An exception has occurred on
> > the
> > > > > > server,
> > > > > > refer to server log for details.
> > > > > >
> > > > > > I am getting these logs from {solrHome}/server/log. Is this the
> > > correct
> > > > > > folder to get the log, or is there another folder which may
> contain
> > > the
> > > > > > error?
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > > >
> > > > > >
> > > > > > On 14 March 2017 at 21:47, Joel Bernstein 
> > > wrote:
> > > > > >
> > > > > > > You're getting json parse errors, that look like your getting
> an
> > > XML
> > > > > > > response. Do you see any errors in the logs other then the
> stack
> > > > > trace. I
> > > > > > > suspect there might be another error above the stack trace
> which
> > > > shows
> > > > > > the
> > > > > > > error from the server that causing it to respond with XML.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Joel Bernstein
> > > > > > > http://joelsolr.blogspot.com/
> > > > > > >
> > > > > > > On Mon, Mar 13, 2017 at 11:01 PM, Zheng Lin Edwin Yeo <
> > > > > > > edwinye...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Joel,
> > > > > > > >
> > > > > > > > >One thing it could be is that gatherNodes will only work on
> > > single
> > > > > > value
> > > > > > > > >fields currently.
> > > > > > > >
> > > > > > > > Regarding this, the 

Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread Erick Erickson
bq: If I changed the routing strategy back to composite (which it should be). is
it ok?

I sincerely doubt it. The docs have already been routed to the wrong
place (actually, I'm not sure how it worked at all). You can't get
them redistributed simply by changing the definition in ZooKeeper,
they're _already_ in the wrong place.

I'd tear down the corrupted data center and rebuild the collection.
Here "tear down" is delete all the affected collections and start over
again.

On the plus side, if you can get a window during which you are _not_
indexing you can copy the indexes from one of your good data centers
to the new one. Do it like this:


- Stop indexing.

- Set up the new collection in the corrupted data center. It's
important that it have _exactly_ the same number of shards ad the DC
you're going to transfer _from_. Also, make it leader only, i.e.
exactly 1 replica per shard.

- copy the indexes over from the good data center to the corresponding
shards. Here "corresponding" means that the source and destination
have the same hash range, which you can see from the state.json (or
clusterstate.json if you're on an earlier format). NOTE: there are two
ways to do this:
-- Just do file copies, scp, hand carry CDs, whatever. Solr should be
offline in the target data center.
-- use the replication API to issue a "fetchindex" command. This works
even in cloud mode, all the target Solr instance needs is access to a
URL it can pull from. Solr of course needs to be running in this case.

- Bring up Solr on the target data center and verify it's working.

- Use the Collections API to ADDREPLICA on the target system until you
build out the collection with the numbers of replicas you want.

- Start indexing to the target data center.

The bits about shutting off indexing is a safety measure, it
guarantees that the indexes are consistent. If you can't shut indexing
down during the transfer, you'll need to index docs to the
newly-rebuilt cluster in some manner that guarantees the two DCs will
have the same docs eventually.

Best,
Erick



On Tue, Mar 14, 2017 at 3:26 PM, vbindal  wrote:
> I think I dint explain properly.
>
> I have 3 data centers each with its own SOLR cloud.
>
> My original strategy was composite routing but when one data center went
> down and we brought it back, somehow the routing strategy on this changed to
> implicit (Other 2 DC still have composit and they are working absolutely
> fine).
>
> This might be the reason for the data corruption on that DS because the
> routing strategy got changed.
>
> If I changed the routing strategy back to composite (which it should be). is
> it ok? Do I need to do anything more than simply changing the strategy in
> the clusterState.json?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4325001.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using fetch function with streaming expression

2017-03-14 Thread Pratik Patel
Wow, this is interesting! Is it going to be a new addition to solr or is it
already available cause I can not find it in documentation? I am using solr
version 6.4.1.

On Tue, Mar 14, 2017 at 7:41 PM, Joel Bernstein  wrote:

> I'm going to add a "cartesian" function that create a cartesian product
> from a multi-value field. This will turn a single tuple with a multi-value
> into multiple tuples with a single value field. This will allow the fetch
> operation to work on ancestors. It also has many other use cases. Sample
> syntax:
>
> fetch(collection1,
>  cartesian(field=ancestors,
>  having(gatherNodes(collection1,
>
>  search(collection1,
>
>  q="*:*",
>
>  fl="conceptid",
>
>  sort="conceptid asc",
>
>  fq=storeid:"524efcfd505637004b1f6f24",
>
>  fq=tags:"Company",
>
>  fq=tags:"Prospects2",
>
>  qt="/export"),
>
> walk=conceptid->eventParticipantID,
>
> gather="eventID",
>   t
> rackTraversal="true",
>
> scatter="leaves",
> count(*)),
>  gt(count(*),1))),
>  fl="concept_name",
>  on="ancestors=conceptid")
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 14, 2017 at 11:51 AM, Pratik Patel 
> wrote:
>
> > Hi, Joel. Thanks for the reply.
> >
> > So, I need to do some graph traversal queries for my use case. In my data
> > set, I have concepts and events.
> >
> > concept : {name, address, bio ..},
> > > event: {name, date, participantIds:[concept1, concept2...] .}
> >
> >
> > Events connects two or more concepts. So, this is a graph data where
> > concepts are connected to each other via events. Each event store links
> to
> > the concepts that it connects. So the field which stores those links is
> > multivalued. This is a natural structure for my data on which I wanted to
> > do some advanced graph traversal queries with some streaming expression.
> > However, gatherNodes() function does not support multivalued fields yet.
> > So, I changed my index structure to be something like this.
> >
> > concept : {conceptId, name, address, bio ..},
> > > event: {eventId, name, date, participantIds:[concept1, concept2...]
> > .}
> > > *create eventLink documents for each participantId in each
> > > event
> > > eventLink:{eventid, conceptid, id}
> >
> >
> >
> > I created eventLink documents from each event so that I can traverse the
> > data using gatherNodes() function. With this change, I was able to do
> graph
> > query and get Ids of concepts which I wanted. However, I only have ids of
> > concepts. Now, using these ids, I want additional data from concept
> > documents like concept_name or address or bio.  This is what I was trying
> > to achieve with fetch() function but it seems I hit the multivalued
> > limitation again :) The reason why I am storing only the ids in eventLink
> > documents is because I don't want to duplicate data unnecessarily. It
> will
> > complicate maintenance of consistency in index when delete/update
> happens.
> > Is there any way I can achieve this?
> >
> > Thanks!
> > Pratik
> >
> >
> >
> >
> >
> > On Tue, Mar 14, 2017 at 11:24 AM, Joel Bernstein 
> > wrote:
> >
> > > Wow that's an interesting expression!
> > >
> > > The problem is that you are trying to fetch using the ancestors field,
> > > which is multi-valued. fetch doesn't support multi-value join keys. I
> > never
> > > thought someone might try to do that.
> > >
> > > So , your attempting to get the concept names for ancestors?
> > >
> > > Can you explain a little more about the use case?
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Tue, Mar 14, 2017 at 11:08 AM, Pratik Patel 
> > > wrote:
> > >
> > > > I have two types of documents in my index. eventLink and
> concepttData.
> > > >
> > > > eventLink  { ancestors:[,] }
> > > > conceptData-{ id:id1, conceptid, concept_name . > data> }
> > > >
> > > > Both are in same collection.
> > > > In my query, I am doing a gatherNodes query wrapped in some other
> > > function
> > > > and ultimately I am getting a bunch of eventLink documents. Now, I am
> > > > trying to get conceptData document for each id specified in
> eventLink's
> > > > ancestors field. I am trying to do that using fetch() function. Here
> is
> > > > simplified form of my query.
> > > >
> > > > fetch(collection1,
> > > > >  function to get eventLinks,
> > > > >   fl="concept_name",
> > > > >   on="ancestors=conceptid"
> > > > > )
> > > >
> > > >
> > > > On executing this query, I am getting back same set of documents
> which
> > > are
> > > > results of my streaming expression containing gatherNodes() function.
> > No
> > > > fields are added to the tuples. From documentation, it seems like
> fetch
> > > > would fetch additional data and add it to the 

Re: Using fetch function with streaming expression

2017-03-14 Thread Joel Bernstein
I'm going to add a "cartesian" function that create a cartesian product
from a multi-value field. This will turn a single tuple with a multi-value
into multiple tuples with a single value field. This will allow the fetch
operation to work on ancestors. It also has many other use cases. Sample
syntax:

fetch(collection1,
 cartesian(field=ancestors,
 having(gatherNodes(collection1,

 search(collection1,

 q="*:*",

 fl="conceptid",

 sort="conceptid asc",

 fq=storeid:"524efcfd505637004b1f6f24",

 fq=tags:"Company",

 fq=tags:"Prospects2",

 qt="/export"),

walk=conceptid->eventParticipantID,

gather="eventID",
  t
rackTraversal="true",

scatter="leaves",
count(*)),
 gt(count(*),1))),
 fl="concept_name",
 on="ancestors=conceptid")

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 14, 2017 at 11:51 AM, Pratik Patel  wrote:

> Hi, Joel. Thanks for the reply.
>
> So, I need to do some graph traversal queries for my use case. In my data
> set, I have concepts and events.
>
> concept : {name, address, bio ..},
> > event: {name, date, participantIds:[concept1, concept2...] .}
>
>
> Events connects two or more concepts. So, this is a graph data where
> concepts are connected to each other via events. Each event store links to
> the concepts that it connects. So the field which stores those links is
> multivalued. This is a natural structure for my data on which I wanted to
> do some advanced graph traversal queries with some streaming expression.
> However, gatherNodes() function does not support multivalued fields yet.
> So, I changed my index structure to be something like this.
>
> concept : {conceptId, name, address, bio ..},
> > event: {eventId, name, date, participantIds:[concept1, concept2...]
> .}
> > *create eventLink documents for each participantId in each
> > event
> > eventLink:{eventid, conceptid, id}
>
>
>
> I created eventLink documents from each event so that I can traverse the
> data using gatherNodes() function. With this change, I was able to do graph
> query and get Ids of concepts which I wanted. However, I only have ids of
> concepts. Now, using these ids, I want additional data from concept
> documents like concept_name or address or bio.  This is what I was trying
> to achieve with fetch() function but it seems I hit the multivalued
> limitation again :) The reason why I am storing only the ids in eventLink
> documents is because I don't want to duplicate data unnecessarily. It will
> complicate maintenance of consistency in index when delete/update happens.
> Is there any way I can achieve this?
>
> Thanks!
> Pratik
>
>
>
>
>
> On Tue, Mar 14, 2017 at 11:24 AM, Joel Bernstein 
> wrote:
>
> > Wow that's an interesting expression!
> >
> > The problem is that you are trying to fetch using the ancestors field,
> > which is multi-valued. fetch doesn't support multi-value join keys. I
> never
> > thought someone might try to do that.
> >
> > So , your attempting to get the concept names for ancestors?
> >
> > Can you explain a little more about the use case?
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Mar 14, 2017 at 11:08 AM, Pratik Patel 
> > wrote:
> >
> > > I have two types of documents in my index. eventLink and concepttData.
> > >
> > > eventLink  { ancestors:[,] }
> > > conceptData-{ id:id1, conceptid, concept_name . data> }
> > >
> > > Both are in same collection.
> > > In my query, I am doing a gatherNodes query wrapped in some other
> > function
> > > and ultimately I am getting a bunch of eventLink documents. Now, I am
> > > trying to get conceptData document for each id specified in eventLink's
> > > ancestors field. I am trying to do that using fetch() function. Here is
> > > simplified form of my query.
> > >
> > > fetch(collection1,
> > > >  function to get eventLinks,
> > > >   fl="concept_name",
> > > >   on="ancestors=conceptid"
> > > > )
> > >
> > >
> > > On executing this query, I am getting back same set of documents which
> > are
> > > results of my streaming expression containing gatherNodes() function.
> No
> > > fields are added to the tuples. From documentation, it seems like fetch
> > > would fetch additional data and add it to the tuples. However, that is
> > not
> > > happening. Resulting tuples does not have concept_name field in them.
> > What
> > > am I missing here? I really need to get this additional data from one
> > solr
> > > query so that I don't have to iterate over the eventLinks and get
> > > additional data by individual queries. That would badly impact
> > performance.
> > > Any suggestions?
> > >
> > > Here is my actual query and the response.
> > >
> > >
> > > fetch(collection1,
> > > >  having(
> > > > gatherNodes(collection1,

Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread vbindal
I think I dint explain properly.

I have 3 data centers each with its own SOLR cloud. 

My original strategy was composite routing but when one data center went
down and we brought it back, somehow the routing strategy on this changed to
implicit (Other 2 DC still have composit and they are working absolutely
fine).

This might be the reason for the data corruption on that DS because the
routing strategy got changed.

If I changed the routing strategy back to composite (which it should be). is
it ok? Do I need to do anything more than simply changing the strategy in
the clusterState.json? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4325001.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet? Search problem

2017-03-14 Thread David Hastings
glad it worked for you.  im planning on some experimentation using that
feature, could contribute to an interface nicely if thought through well.

On Tue, Mar 14, 2017 at 2:25 PM, Scott Smith 
wrote:

> Grouping appears to be exactly what I'm looking for.  I added
> "group=true=category" to my search and It appears that I get
> a list of groups, one document in each group that matches the search along
> with (bonus) the number of documents in the category that match that
> search. Perfect.  Thank you very much.
>
> -Original Message-
> From: Dave [mailto:hastings.recurs...@gmail.com]
> Sent: Monday, March 13, 2017 7:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facet? Search problem
>
> Perhaps look into grouping on that field.
>
> > On Mar 13, 2017, at 9:08 PM, Scott Smith 
> wrote:
> >
> > I'm trying to solve a search problem and wondering if facets (or
> something else) might solve the problem.
> >
> > Let's assume I have a bunch of documents (100 million+).  Each document
> has a category (keyword) assigned to it.  A single document my only have
> one category, but there may be multiple documents with the same category (1
> to a few hundred documents may be in any one category).  There are several
> million categories.
> >
> > Supposed I'm doing a search with a page size of 50.  What I want to do
> is do a search (e.g., "dog") and get back the top 50 documents that match
> the contain the word "dog" and are all in different categories.  So, there
> needs to be one document from 50 different categories.
> >
> > If that's not possible, then is it possible to do it if I know the 50
> categories up-front and hand that off as part of the search (so "find 50
> documents that match the term 'dog' and there is one document from each of
> 50 specified categories").
> >
> > Is there a way to do this?
> >
> > I'm not extremely knowledgeable about facets, but thought that might be
> a solution.  But, it doesn't have to be facets.
> >
> > Thanks for any help
> >
> > Scott
> >
> >
>


RE: Facet? Search problem

2017-03-14 Thread Scott Smith
Thanks.  I'll look at that as well.

-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@gmail.com] 
Sent: Tuesday, March 14, 2017 1:20 PM
To: solr-user@lucene.apache.org
Subject: RE: Facet? Search problem

Scott

Depending on what you're looking for
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
might be worth a look as well.

-Stefan

On Mar 14, 2017 7:25 PM, "Scott Smith"  wrote:

> Grouping appears to be exactly what I'm looking for.  I added 
> "group=true=category" to my search and It appears that I 
> get a list of groups, one document in each group that matches the 
> search along with (bonus) the number of documents in the category that 
> match that search. Perfect.  Thank you very much.
>
> -Original Message-
> From: Dave [mailto:hastings.recurs...@gmail.com]
> Sent: Monday, March 13, 2017 7:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facet? Search problem
>
> Perhaps look into grouping on that field.
>
> > On Mar 13, 2017, at 9:08 PM, Scott Smith 
> wrote:
> >
> > I'm trying to solve a search problem and wondering if facets (or
> something else) might solve the problem.
> >
> > Let's assume I have a bunch of documents (100 million+).  Each 
> > document
> has a category (keyword) assigned to it.  A single document my only 
> have one category, but there may be multiple documents with the same 
> category (1 to a few hundred documents may be in any one category).  
> There are several million categories.
> >
> > Supposed I'm doing a search with a page size of 50.  What I want to 
> > do
> is do a search (e.g., "dog") and get back the top 50 documents that 
> match the contain the word "dog" and are all in different categories.  
> So, there needs to be one document from 50 different categories.
> >
> > If that's not possible, then is it possible to do it if I know the 
> > 50
> categories up-front and hand that off as part of the search (so "find 
> 50 documents that match the term 'dog' and there is one document from 
> each of
> 50 specified categories").
> >
> > Is there a way to do this?
> >
> > I'm not extremely knowledgeable about facets, but thought that might 
> > be
> a solution.  But, it doesn't have to be facets.
> >
> > Thanks for any help
> >
> > Scott
> >
> >
>


RE: Facet? Search problem

2017-03-14 Thread Stefan Matheis
Scott

Depending on what you're looking for
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
might be worth a look as well.

-Stefan

On Mar 14, 2017 7:25 PM, "Scott Smith"  wrote:

> Grouping appears to be exactly what I'm looking for.  I added
> "group=true=category" to my search and It appears that I get
> a list of groups, one document in each group that matches the search along
> with (bonus) the number of documents in the category that match that
> search. Perfect.  Thank you very much.
>
> -Original Message-
> From: Dave [mailto:hastings.recurs...@gmail.com]
> Sent: Monday, March 13, 2017 7:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Facet? Search problem
>
> Perhaps look into grouping on that field.
>
> > On Mar 13, 2017, at 9:08 PM, Scott Smith 
> wrote:
> >
> > I'm trying to solve a search problem and wondering if facets (or
> something else) might solve the problem.
> >
> > Let's assume I have a bunch of documents (100 million+).  Each document
> has a category (keyword) assigned to it.  A single document my only have
> one category, but there may be multiple documents with the same category (1
> to a few hundred documents may be in any one category).  There are several
> million categories.
> >
> > Supposed I'm doing a search with a page size of 50.  What I want to do
> is do a search (e.g., "dog") and get back the top 50 documents that match
> the contain the word "dog" and are all in different categories.  So, there
> needs to be one document from 50 different categories.
> >
> > If that's not possible, then is it possible to do it if I know the 50
> categories up-front and hand that off as part of the search (so "find 50
> documents that match the term 'dog' and there is one document from each of
> 50 specified categories").
> >
> > Is there a way to do this?
> >
> > I'm not extremely knowledgeable about facets, but thought that might be
> a solution.  But, it doesn't have to be facets.
> >
> > Thanks for any help
> >
> > Scott
> >
> >
>


Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread Erick Erickson
That would make the problem even worse. If you created the collection
with implicit routing, there are no hash ranges for each shard.
CompositeId requires hash ranges to be defined for each shard. Don't
even try.

Best,
Erick

On Tue, Mar 14, 2017 at 11:13 AM, vbindal  wrote:
> Compared it against the other 2 datacenters and they both have `compositeId
> `.
>
> This started happening after 1 of our zookeeper died due to hardware issue
> and we had to setup a new zookeeper machine. update the config in all the
> solr machine and restart the cloud. My guess is something went wrong and
> `implicit` router got created.
>
> Can I simply change the `clusterstate.json` to take care of this?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324950.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: Facet? Search problem

2017-03-14 Thread Scott Smith
Grouping appears to be exactly what I'm looking for.  I added 
"group=true=category" to my search and It appears that I get a list 
of groups, one document in each group that matches the search along with 
(bonus) the number of documents in the category that match that search. 
Perfect.  Thank you very much.

-Original Message-
From: Dave [mailto:hastings.recurs...@gmail.com] 
Sent: Monday, March 13, 2017 7:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Facet? Search problem

Perhaps look into grouping on that field. 

> On Mar 13, 2017, at 9:08 PM, Scott Smith  wrote:
> 
> I'm trying to solve a search problem and wondering if facets (or something 
> else) might solve the problem.
> 
> Let's assume I have a bunch of documents (100 million+).  Each document has a 
> category (keyword) assigned to it.  A single document my only have one 
> category, but there may be multiple documents with the same category (1 to a 
> few hundred documents may be in any one category).  There are several million 
> categories.
> 
> Supposed I'm doing a search with a page size of 50.  What I want to do is do 
> a search (e.g., "dog") and get back the top 50 documents that match the 
> contain the word "dog" and are all in different categories.  So, there needs 
> to be one document from 50 different categories.
> 
> If that's not possible, then is it possible to do it if I know the 50 
> categories up-front and hand that off as part of the search (so "find 50 
> documents that match the term 'dog' and there is one document from each of 50 
> specified categories").
> 
> Is there a way to do this?
> 
> I'm not extremely knowledgeable about facets, but thought that might be a 
> solution.  But, it doesn't have to be facets.
> 
> Thanks for any help
> 
> Scott
> 
> 


Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread vbindal
Compared it against the other 2 datacenters and they both have `compositeId
`.

This started happening after 1 of our zookeeper died due to hardware issue
and we had to setup a new zookeeper machine. update the config in all the
solr machine and restart the cloud. My guess is something went wrong and
`implicit` router got created.

Can I simply change the `clusterstate.json` to take care of this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324950.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Iterating sorted result docs in a custom search component

2017-03-14 Thread alexpusch
I ended up using ValueSource, and FunctionValues (as used in statsComponent)

FieldType fieldType = schemaField.getType();
ValueSource valueSource = fieldType.getValueSource(schemaField, null);
FunctionValues values = valueSource.getValues(Collections.emptyMap(), ctx);

values.strVal(docId)

I hope that's analogous to your suggested method

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Iterating-sorted-result-docs-in-a-custom-search-component-tp4324497p4324947.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread Erick Erickson
The default router has always been compositeId, but when you created
your collection you may have created it with implicit. Looking at the
clusterstate.json and/or state.json in the individual collection
should show you (admin UI>>cloud>>tree).

But we need to be very clear about what a "duplicate" document is.
Solr routes/replaces documents based on whatever you've defined as
 in your schema file (assuming compositeID routing). When
you say you get dups when you re-index, it sounds like you are somehow
using different s for what you consider the "same"
document.

bq: Also, This started happening after 1 of our zookeeper died due to hardware
issue and we had to setup a new zookeeper machine. update the config in all
the solr machine and restart the cloud.

Hmmm. "update the config in all the solr machine...". You should not
have to do this in SolrCloud. All the configs are stored in Zookeeper
and loaded from ZK when the Solr instance starts. What it's starting
to sound like is that you've somehow mixed up SorlCloud and older
"stand-alone" concepts and "somehow" your restoration process messed
up your configs.

So if the issue isn't that you're somehow using different
's, I'd recommend just starting over with a new collection
since you can re-index from scratch.

Best,
Erick

On Tue, Mar 14, 2017 at 10:35 AM, vbindal  wrote:
> Hi Shawn,
>
> We are on 4.10.0 version. Is that the default router in this version? Also,
> we dont see all the documents duplicated, only some of them. I have a
> indexer job to index data in SOLR. After I delete all the records and run
> this job, the count is correct but when I run the job again, we start seeing
> higher count and duplicate records (random records) in shards.
>
> Also, This started happening after 1 of our zookeeper died due to hardware
> issue and we had to setup a new zookeeper machine. update the config in all
> the solr machine and restart the cloud.
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324937.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-14 Thread Shawn Heisey
On 3/14/2017 10:23 AM, Elodie Sannier wrote:
> The request close() method decrements the reference count on the
> searcher.

>From what I could tell, that method decrements the reference counter,
but does not actually close the searcher object.  I cannot tell you what
the correct procedure is to make sure that all resources are properly
closed at the proper time.  This might be a bug, or there might be
something missing from your code.  I do not know which.

Thanks,
Shawn



Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread vbindal
Hi Shawn,

We are on 4.10.0 version. Is that the default router in this version? Also,
we dont see all the documents duplicated, only some of them. I have a
indexer job to index data in SOLR. After I delete all the records and run
this job, the count is correct but when I run the job again, we start seeing
higher count and duplicate records (random records) in shards. 

Also, This started happening after 1 of our zookeeper died due to hardware
issue and we had to setup a new zookeeper machine. update the config in all
the solr machine and restart the cloud. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inconsistent-numFound-in-SC-when-querying-core-directly-tp4105009p4324937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
Yeah, there has been a lot of changes to configs in Solr 6. All the
streaming request handlers have now been made implicit so the
solrconfig.xml doesn't include them. Something seems to be stepping on the
implicit configs.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 14, 2017 at 12:20 PM, Zheng Lin Edwin Yeo 
wrote:

> Could it be because the solrconfig.xml was created in Solr 5.x, and was
> upgraded to Solr 6.x, and there is something which I have missed out during
> the upgrading?
>
> So far for this server, only the schema.xml and solrconfig.xml was carried
> forward and modified from Solr 5.x. The files for Solr 6.4.1 were
> downloaded directly from the Solr website, and the index were indexed
> directly in Solr 6.4.1.
>
> Regards,
> Edwin
>
>
> On 14 March 2017 at 23:53, Joel Bernstein  wrote:
>
> > Yeah, something is wrong with the configuration, because /export only
> > should be returning json. Have you changed the configurations?
> >
> > What were the exact steps you used in setting up the server?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Mar 14, 2017 at 11:50 AM, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> > wrote:
> >
> > > Hi Joel,
> > >
> > > This is what get from query:
> > >
> > > 
> > > 
> > > true
> > > 0
> > > 0
> > > 
> > > 
> > > 
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 14 March 2017 at 22:33, Joel Bernstein  wrote:
> > >
> > > > try running the following query:
> > > >
> > > > http://localhost:8983/solr/email/export?{!terms+f%3Dfrom}
> > ed...@mail.com
> > > > =false=from,to=to+asc,from+asc=json=2.2
> > > >
> > > > Let's see what comes back from this.
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Tue, Mar 14, 2017 at 10:20 AM, Zheng Lin Edwin Yeo <
> > > > edwinye...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Joel,
> > > > >
> > > > > I have only managed to find these above the stack trace.
> > > > >
> > > > > 2017-03-14 14:08:42.819 INFO  (qtp1543727556-2635) [   ]
> > > > > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> > > > > params={wt=json&_=1489500479108=0} status=0 QTime=0
> > > > > 2017-03-14 14:08:43.085 INFO  (qtp1543727556-2397) [c:email
> s:shard1
> > > > > r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr
> > > > path=/stream
> > > > > params={indent=true=gatherNodes(email,+walk%3D"edwin@mail-
> > > > > >from",+gather%3D"to")}
> > > > > status=0 QTime=0
> > > > > 2017-03-14 14:08:43.116 INFO  (qtp1543727556-8207) [c:email
> s:shard1
> > > > > r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr
> > > > path=/export
> > > > > params={q={!terms+f%3Dfrom}ed...@mail.com=false&
> > > > > fl=from,to=to+asc,from+asc=json=2.2}
> > > > > hits=2471 status=0 QTime=19
> > > > > 2017-03-14 14:08:43.163 ERROR (qtp1543727556-2397) [c:email
> s:shard1
> > > > > r:core_node1 x:email] o.a.s.c.s.i.s.ExceptionStream
> > > > > java.lang.RuntimeException: java.util.concurrent.
> ExecutionException:
> > > > > java.lang.RuntimeException: java.io.IOException:
> > > > > java.util.concurrent.ExecutionException: java.io.IOException: -->
> > > > > http://localhost:8983/solr/email/: An exception has occurred on
> the
> > > > > server,
> > > > > refer to server log for details.
> > > > >
> > > > > I am getting these logs from {solrHome}/server/log. Is this the
> > correct
> > > > > folder to get the log, or is there another folder which may contain
> > the
> > > > > error?
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > > >
> > > > > On 14 March 2017 at 21:47, Joel Bernstein 
> > wrote:
> > > > >
> > > > > > You're getting json parse errors, that look like your getting an
> > XML
> > > > > > response. Do you see any errors in the logs other then the stack
> > > > trace. I
> > > > > > suspect there might be another error above the stack trace which
> > > shows
> > > > > the
> > > > > > error from the server that causing it to respond with XML.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Joel Bernstein
> > > > > > http://joelsolr.blogspot.com/
> > > > > >
> > > > > > On Mon, Mar 13, 2017 at 11:01 PM, Zheng Lin Edwin Yeo <
> > > > > > edwinye...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Joel,
> > > > > > >
> > > > > > > >One thing it could be is that gatherNodes will only work on
> > single
> > > > > value
> > > > > > > >fields currently.
> > > > > > >
> > > > > > > Regarding this, the fields which I am using in the query is
> > > already a
> > > > > > > single value field, not multi-value field.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Edwin
> > > > > > >
> > > > > > >
> > > > > > > On 14 March 2017 at 10:04, Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Joel,
> > > > > > > >
> > > > > > > > This is the details which I get form the logs.
> > > > > > > >
> > > > > 

Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-14 Thread Erick Erickson
Yeah, it's a little confusing. But SolrQueryReqeustBase.getSearcher
calls, in turn, core.getSearcher which explicitly says in the
javadocs:

* If returnSearcher==true then a SolrIndexSearcher will be returned with
* the reference count incremented.  It must be decremented when no
longer needed.

See a similar pattern in IndexFetcher:

  searcher = core.getSearcher(true, true, waitSearcher, true);
try {
   blah blah blah
} finally {
  if (searcher != null) {
searcher.decref();
  }
  core.close();
}

This is so fundamental to Solr operating at _all_ that I'd lay long
odds this is just confusing, not a bug or everybody would be running
out of file handles.

Best,
Erick

On Tue, Mar 14, 2017 at 9:23 AM, Elodie Sannier
 wrote:
> The request close() method decrements the reference count on the searcher.
>
> public abstract class SolrQueryRequestBase implements SolrQueryRequest,
> Closeable {
>
>   // The index searcher associated with this request
>   protected RefCounted searcherHolder;
>
>   public void close() {
> if(this.searcherHolder != null) {
>   this.searcherHolder.decref();
>   this.searcherHolder = null;
> }
>   }
> }
>
> RefCounted keeps track of a reference count on the searcher and closes
> it when the count hits zero.
>
> public abstract class RefCounted {
>   ...
>   public void decref() {
> if (refcount.decrementAndGet() == 0) {
>   close();
> }
>   }
> }
>
> We asume that when we call req.getSearcher() - this increases the
> reference count, after we are done with the searcher, we have to call
> close() to call decref() to decrease the reference count.
>
> But it does not seem enough or maybe there is a bug in Solr in this case ?
>
> Elodie
>
> On 03/14/2017 03:02 PM, Shawn Heisey wrote:
>>
>> On 3/14/2017 3:08 AM, Gerald Reinhart wrote:
>>>
>>> Hi,
>>> The custom code we have is something like this :
>>> public class MySearchHandlerextends SearchHandler {
>>> @Override public void handleRequestBody(SolrQueryRequest req,
>>> SolrQueryResponse rsp)throws Exception {
>>>  SolrIndexSearcher  searcher =req.getSearcher();
>>>  try{
>>>   // Do stuff with the searcher
>>>  }finally {
>>>  req.close();
>>>  }
>>
>> 
>>>
>>>   Despite the fact that we always close the request each time we get
>>> a SolrIndexSearcher from the request, the number of SolrIndexSearcher
>>> instances is increasing. Each time a new commit is done on the index, a
>>> new Searcher is created (this is normal) but the old one remains. Is
>>> there something wrong with this custom code ?
>>
>> My understanding of Solr and Lucene internals is rudimentary, but I
>> might know what's happening here.
>>
>> The code closes the request, but never closes the searcher.  Searcher
>> objects include a Lucene object that holds onto the index files that
>> pertain to that view of the index.  The searcher must be closed.
>>
>> It does look like if you close the searcher and then close the request,
>> that might be enough to fully decrement all the reference counters
>> involved, but I do not know the code well enough to be sure of that.
>>
>> Thanks,
>> Shawn
>>
>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 158 Ter Rue du Temple 75003 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à l'attention
> exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce
> message, merci de le détruire et d'en avertir l'expéditeur.


Re: Using fetch function with streaming expression

2017-03-14 Thread Pratik Patel
Hi, Joel. Thanks for the reply.

So, I need to do some graph traversal queries for my use case. In my data
set, I have concepts and events.

concept : {name, address, bio ..},
> event: {name, date, participantIds:[concept1, concept2...] .}


Events connects two or more concepts. So, this is a graph data where
concepts are connected to each other via events. Each event store links to
the concepts that it connects. So the field which stores those links is
multivalued. This is a natural structure for my data on which I wanted to
do some advanced graph traversal queries with some streaming expression.
However, gatherNodes() function does not support multivalued fields yet.
So, I changed my index structure to be something like this.

concept : {conceptId, name, address, bio ..},
> event: {eventId, name, date, participantIds:[concept1, concept2...] .}
> *create eventLink documents for each participantId in each
> event
> eventLink:{eventid, conceptid, id}



I created eventLink documents from each event so that I can traverse the
data using gatherNodes() function. With this change, I was able to do graph
query and get Ids of concepts which I wanted. However, I only have ids of
concepts. Now, using these ids, I want additional data from concept
documents like concept_name or address or bio.  This is what I was trying
to achieve with fetch() function but it seems I hit the multivalued
limitation again :) The reason why I am storing only the ids in eventLink
documents is because I don't want to duplicate data unnecessarily. It will
complicate maintenance of consistency in index when delete/update happens.
Is there any way I can achieve this?

Thanks!
Pratik





On Tue, Mar 14, 2017 at 11:24 AM, Joel Bernstein  wrote:

> Wow that's an interesting expression!
>
> The problem is that you are trying to fetch using the ancestors field,
> which is multi-valued. fetch doesn't support multi-value join keys. I never
> thought someone might try to do that.
>
> So , your attempting to get the concept names for ancestors?
>
> Can you explain a little more about the use case?
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 14, 2017 at 11:08 AM, Pratik Patel 
> wrote:
>
> > I have two types of documents in my index. eventLink and concepttData.
> >
> > eventLink  { ancestors:[,] }
> > conceptData-{ id:id1, conceptid, concept_name . }
> >
> > Both are in same collection.
> > In my query, I am doing a gatherNodes query wrapped in some other
> function
> > and ultimately I am getting a bunch of eventLink documents. Now, I am
> > trying to get conceptData document for each id specified in eventLink's
> > ancestors field. I am trying to do that using fetch() function. Here is
> > simplified form of my query.
> >
> > fetch(collection1,
> > >  function to get eventLinks,
> > >   fl="concept_name",
> > >   on="ancestors=conceptid"
> > > )
> >
> >
> > On executing this query, I am getting back same set of documents which
> are
> > results of my streaming expression containing gatherNodes() function. No
> > fields are added to the tuples. From documentation, it seems like fetch
> > would fetch additional data and add it to the tuples. However, that is
> not
> > happening. Resulting tuples does not have concept_name field in them.
> What
> > am I missing here? I really need to get this additional data from one
> solr
> > query so that I don't have to iterate over the eventLinks and get
> > additional data by individual queries. That would badly impact
> performance.
> > Any suggestions?
> >
> > Here is my actual query and the response.
> >
> >
> > fetch(collection1,
> > >  having(
> > > gatherNodes(collection1,
> > > search(collection1,q="*:*",fl="conceptid",sort="conceptid
> > > asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:"
> > Prospects2",
> > > qt="/export"),
> > > walk=conceptid->eventParticipantID,
> > > gather="eventID",
> > > trackTraversal="true", scatter="leaves",
> > > count(*)
> > > ),
> > > gt(count(*),1)
> > > ),
> > > fl="concept_name",
> > > on="ancestors=conceptid"
> > > )
> >
> >
> >
> > Response :
> >
> > {
> > > "result-set": {
> > > "docs": [
> > > {
> > > "node": "524f03355056c8b53b4ed199",
> > > "field": "eventID",
> > > "level": 1,
> > > "count(*)": 2,
> > > "collection": "collection1",
> > > "ancestors": [
> > > "524f02845056c8b53b4e9871",
> > > "524f02755056c8b53b4e9269"
> > > ]
> > > },
> > > .
> > > }
> >
> >
> > Thanks,
> > Pratik
> >
>


Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-14 Thread Elodie Sannier

The request close() method decrements the reference count on the searcher.

public abstract class SolrQueryRequestBase implements SolrQueryRequest,
Closeable {

  // The index searcher associated with this request
  protected RefCounted searcherHolder;

  public void close() {
if(this.searcherHolder != null) {
  this.searcherHolder.decref();
  this.searcherHolder = null;
}
  }
}

RefCounted keeps track of a reference count on the searcher and closes
it when the count hits zero.

public abstract class RefCounted {
  ...
  public void decref() {
if (refcount.decrementAndGet() == 0) {
  close();
}
  }
}

We asume that when we call req.getSearcher() - this increases the
reference count, after we are done with the searcher, we have to call
close() to call decref() to decrease the reference count.

But it does not seem enough or maybe there is a bug in Solr in this case ?

Elodie

On 03/14/2017 03:02 PM, Shawn Heisey wrote:

On 3/14/2017 3:08 AM, Gerald Reinhart wrote:

Hi,
The custom code we have is something like this :
public class MySearchHandlerextends SearchHandler {
@Override public void handleRequestBody(SolrQueryRequest req,
SolrQueryResponse rsp)throws Exception {
 SolrIndexSearcher  searcher =req.getSearcher();
 try{
  // Do stuff with the searcher
 }finally {
 req.close();
 }



  Despite the fact that we always close the request each time we get
a SolrIndexSearcher from the request, the number of SolrIndexSearcher
instances is increasing. Each time a new commit is done on the index, a
new Searcher is created (this is normal) but the old one remains. Is
there something wrong with this custom code ?

My understanding of Solr and Lucene internals is rudimentary, but I
might know what's happening here.

The code closes the request, but never closes the searcher.  Searcher
objects include a Lucene object that holds onto the index files that
pertain to that view of the index.  The searcher must be closed.

It does look like if you close the searcher and then close the request,
that might be enough to fully decrement all the reference counters
involved, but I do not know the code well enough to be sure of that.

Thanks,
Shawn




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Zheng Lin Edwin Yeo
Could it be because the solrconfig.xml was created in Solr 5.x, and was
upgraded to Solr 6.x, and there is something which I have missed out during
the upgrading?

So far for this server, only the schema.xml and solrconfig.xml was carried
forward and modified from Solr 5.x. The files for Solr 6.4.1 were
downloaded directly from the Solr website, and the index were indexed
directly in Solr 6.4.1.

Regards,
Edwin


On 14 March 2017 at 23:53, Joel Bernstein  wrote:

> Yeah, something is wrong with the configuration, because /export only
> should be returning json. Have you changed the configurations?
>
> What were the exact steps you used in setting up the server?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 14, 2017 at 11:50 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> wrote:
>
> > Hi Joel,
> >
> > This is what get from query:
> >
> > 
> > 
> > true
> > 0
> > 0
> > 
> > 
> > 
> >
> > Regards,
> > Edwin
> >
> >
> > On 14 March 2017 at 22:33, Joel Bernstein  wrote:
> >
> > > try running the following query:
> > >
> > > http://localhost:8983/solr/email/export?{!terms+f%3Dfrom}
> ed...@mail.com
> > > =false=from,to=to+asc,from+asc=json=2.2
> > >
> > > Let's see what comes back from this.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Tue, Mar 14, 2017 at 10:20 AM, Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com>
> > > wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > I have only managed to find these above the stack trace.
> > > >
> > > > 2017-03-14 14:08:42.819 INFO  (qtp1543727556-2635) [   ]
> > > > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> > > > params={wt=json&_=1489500479108=0} status=0 QTime=0
> > > > 2017-03-14 14:08:43.085 INFO  (qtp1543727556-2397) [c:email s:shard1
> > > > r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr
> > > path=/stream
> > > > params={indent=true=gatherNodes(email,+walk%3D"edwin@mail-
> > > > >from",+gather%3D"to")}
> > > > status=0 QTime=0
> > > > 2017-03-14 14:08:43.116 INFO  (qtp1543727556-8207) [c:email s:shard1
> > > > r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr
> > > path=/export
> > > > params={q={!terms+f%3Dfrom}ed...@mail.com=false&
> > > > fl=from,to=to+asc,from+asc=json=2.2}
> > > > hits=2471 status=0 QTime=19
> > > > 2017-03-14 14:08:43.163 ERROR (qtp1543727556-2397) [c:email s:shard1
> > > > r:core_node1 x:email] o.a.s.c.s.i.s.ExceptionStream
> > > > java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> > > > java.lang.RuntimeException: java.io.IOException:
> > > > java.util.concurrent.ExecutionException: java.io.IOException: -->
> > > > http://localhost:8983/solr/email/: An exception has occurred on the
> > > > server,
> > > > refer to server log for details.
> > > >
> > > > I am getting these logs from {solrHome}/server/log. Is this the
> correct
> > > > folder to get the log, or is there another folder which may contain
> the
> > > > error?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 14 March 2017 at 21:47, Joel Bernstein 
> wrote:
> > > >
> > > > > You're getting json parse errors, that look like your getting an
> XML
> > > > > response. Do you see any errors in the logs other then the stack
> > > trace. I
> > > > > suspect there might be another error above the stack trace which
> > shows
> > > > the
> > > > > error from the server that causing it to respond with XML.
> > > > >
> > > > >
> > > > >
> > > > > Joel Bernstein
> > > > > http://joelsolr.blogspot.com/
> > > > >
> > > > > On Mon, Mar 13, 2017 at 11:01 PM, Zheng Lin Edwin Yeo <
> > > > > edwinye...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Joel,
> > > > > >
> > > > > > >One thing it could be is that gatherNodes will only work on
> single
> > > > value
> > > > > > >fields currently.
> > > > > >
> > > > > > Regarding this, the fields which I am using in the query is
> > already a
> > > > > > single value field, not multi-value field.
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > > >
> > > > > >
> > > > > > On 14 March 2017 at 10:04, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Joel,
> > > > > > >
> > > > > > > This is the details which I get form the logs.
> > > > > > >
> > > > > > > java.lang.RuntimeException: java.util.concurrent.
> > > ExecutionException:
> > > > > > > java.lang.RuntimeException: java.io.IOException:
> > > > java.util.concurrent.
> > > > > > ExecutionException:
> > > > > > > java.io.IOException: --> http://localhost:8984/solr/email/: An
> > > > > exception
> > > > > > > has occurred on the server, refer to server log for details.
> > > > > > > at org.apache.solr.client.solrj.io.graph.GatherNodesStream.
> > > > > > > read(GatherNodesStream.java:600)
> > > > > > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > > > > > > read(ExceptionStream.java:68)
> > > > > > > at 

Re: Need help with date boost

2017-03-14 Thread Erick Erickson
Rick:

Hmmm, try this:
https://cwiki.apache.org/confluence/display/solr/Function+Queries.
It's not quite as explicit, but it's the latest document.

 Essentially that's what the function on that page does, something
like: "recip(rord (creationDate),1,1000,1000)" the "recip" function is
actually recip(x,m,a,b)implementing a/(m*x+b). So by playing with the
constants, you can boost on a sliding scale where there is more
discrimination between more recent docs. I.e. a great difference in
the boost applied to a doc now .vs. a week ago than between 100 and
101 weeks ago...

Or any other combinations of math functions you want to use for your
boost, that page has a bunch.

Best,
Erick

On Tue, Mar 14, 2017 at 7:28 AM, Rick Leir  wrote:
> Hi Erick
> We have ten year old documents and new ones which often score about the same 
> just based on default similarity. But the newer ones are much more relevant 
> in our case.
>
> Suppose we de-boost proportionally to (NOW/YEAR - modifiedDate/YEAR). Thanks 
> for the link you provided, it had not jumped out at me. Is the syntax up to 
> date (no pun!)?
>
> Walter, how do you calculate this logarithmically?
> Thanks, Rick
>
> On March 14, 2017 1:21:52 AM EDT, Erick Erickson  
> wrote:
>>first I think the requirement is a bad one. Why should a document with
>>low relevance 29 days ago score higher than the perfect document from
>>31 days ago? That doesn't seem like it serves the user very well...
>>
>>And then "However in cases where update date is unavailable I need to
>>sort it using created date." Does that mean if a doc has no update
>>date but does have a created date 10 days ago you want to sort it
>>before any docs older than 30 days? If so the simplest bit here would
>>be to insure that any incoming doc has an update date by copying the
>>created date into the update field if there is no update field.
>>
>>Now, all that aside, you sort by date with function queries. And
>>function queries can have if clauses. So your sort function will be a
>>big long set of if statements giving a boost to docs in your ranges.
>>Or you can make your own function query plugin that would undoubtedly
>>be more efficient. See the ref guide for "function queries".
>>
>>But again, I think this is a case where if you present the people who
>>came up with this requirement with an explanation of the effects, you
>>may just be able to sort by giving weight to more recent documents and
>>be done with it. See:
>>https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
>>
>>Best,
>>Erick
>>
>>On Mon, Mar 13, 2017 at 7:17 PM, Atita Arora 
>>wrote:
>>> Hi all,
>>>
>>> I am trying to resolve a problem here where I have to fiddle around
>>with
>>> set of dates ( created and updated date).
>>> My use is that I have to make sure that the document with latest
>>(recent)
>>>  update date should come higher in my search results.
>>> Precisely,  I am required to maintain 3 buckets wherein documents
>>with
>>> updated date falling in range of last 30 days should have maximum
>>weight,
>>> followed by update date in 60 and 90 and the rest.
>>> However in cases where update date is unavailable I need to sort it
>>using
>>> created date.
>>> I am not sure how do I achieve this.
>>> Any insights here would be a great help.
>>> Thanks in advance.
>>> Regards,
>>> Atita
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
Yeah, something is wrong with the configuration, because /export only
should be returning json. Have you changed the configurations?

What were the exact steps you used in setting up the server?

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 14, 2017 at 11:50 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi Joel,
>
> This is what get from query:
>
> 
> 
> true
> 0
> 0
> 
> 
> 
>
> Regards,
> Edwin
>
>
> On 14 March 2017 at 22:33, Joel Bernstein  wrote:
>
> > try running the following query:
> >
> > http://localhost:8983/solr/email/export?{!terms+f%3Dfrom}ed...@mail.com
> > =false=from,to=to+asc,from+asc=json=2.2
> >
> > Let's see what comes back from this.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Mar 14, 2017 at 10:20 AM, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> > wrote:
> >
> > > Hi Joel,
> > >
> > > I have only managed to find these above the stack trace.
> > >
> > > 2017-03-14 14:08:42.819 INFO  (qtp1543727556-2635) [   ]
> > > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> > > params={wt=json&_=1489500479108=0} status=0 QTime=0
> > > 2017-03-14 14:08:43.085 INFO  (qtp1543727556-2397) [c:email s:shard1
> > > r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr
> > path=/stream
> > > params={indent=true=gatherNodes(email,+walk%3D"edwin@mail-
> > > >from",+gather%3D"to")}
> > > status=0 QTime=0
> > > 2017-03-14 14:08:43.116 INFO  (qtp1543727556-8207) [c:email s:shard1
> > > r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr
> > path=/export
> > > params={q={!terms+f%3Dfrom}ed...@mail.com=false&
> > > fl=from,to=to+asc,from+asc=json=2.2}
> > > hits=2471 status=0 QTime=19
> > > 2017-03-14 14:08:43.163 ERROR (qtp1543727556-2397) [c:email s:shard1
> > > r:core_node1 x:email] o.a.s.c.s.i.s.ExceptionStream
> > > java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> > > java.lang.RuntimeException: java.io.IOException:
> > > java.util.concurrent.ExecutionException: java.io.IOException: -->
> > > http://localhost:8983/solr/email/: An exception has occurred on the
> > > server,
> > > refer to server log for details.
> > >
> > > I am getting these logs from {solrHome}/server/log. Is this the correct
> > > folder to get the log, or is there another folder which may contain the
> > > error?
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 14 March 2017 at 21:47, Joel Bernstein  wrote:
> > >
> > > > You're getting json parse errors, that look like your getting an XML
> > > > response. Do you see any errors in the logs other then the stack
> > trace. I
> > > > suspect there might be another error above the stack trace which
> shows
> > > the
> > > > error from the server that causing it to respond with XML.
> > > >
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Mon, Mar 13, 2017 at 11:01 PM, Zheng Lin Edwin Yeo <
> > > > edwinye...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Joel,
> > > > >
> > > > > >One thing it could be is that gatherNodes will only work on single
> > > value
> > > > > >fields currently.
> > > > >
> > > > > Regarding this, the fields which I am using in the query is
> already a
> > > > > single value field, not multi-value field.
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > > >
> > > > > On 14 March 2017 at 10:04, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Joel,
> > > > > >
> > > > > > This is the details which I get form the logs.
> > > > > >
> > > > > > java.lang.RuntimeException: java.util.concurrent.
> > ExecutionException:
> > > > > > java.lang.RuntimeException: java.io.IOException:
> > > java.util.concurrent.
> > > > > ExecutionException:
> > > > > > java.io.IOException: --> http://localhost:8984/solr/email/: An
> > > > exception
> > > > > > has occurred on the server, refer to server log for details.
> > > > > > at org.apache.solr.client.solrj.io.graph.GatherNodesStream.
> > > > > > read(GatherNodesStream.java:600)
> > > > > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > > > > > read(ExceptionStream.java:68)
> > > > > > at org.apache.solr.handler.StreamHandler$TimerStream.
> > > > > > read(StreamHandler.java:479)
> > > > > > at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> > > > > > writeMap$0(TupleStream.java:67)
> > > > > > at org.apache.solr.response.JSONWriter.writeIterator(
> > > > > > JSONResponseWriter.java:523)
> > > > > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > > > > > TextResponseWriter.java:175)
> > > > > > at org.apache.solr.response.JSONWriter$2.put(
> > > > > JSONResponseWriter.java:559)
> > > > > > at org.apache.solr.client.solrj.io.stream.TupleStream.
> > > > > > writeMap(TupleStream.java:64)
> > > > > > at org.apache.solr.response.JSONWriter.writeMap(
> > > > > > JSONResponseWriter.java:547)
> > > > > > at 

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Zheng Lin Edwin Yeo
Hi Joel,

This is what get from query:



true
0
0




Regards,
Edwin


On 14 March 2017 at 22:33, Joel Bernstein  wrote:

> try running the following query:
>
> http://localhost:8983/solr/email/export?{!terms+f%3Dfrom}ed...@mail.com
> =false=from,to=to+asc,from+asc=json=2.2
>
> Let's see what comes back from this.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 14, 2017 at 10:20 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> wrote:
>
> > Hi Joel,
> >
> > I have only managed to find these above the stack trace.
> >
> > 2017-03-14 14:08:42.819 INFO  (qtp1543727556-2635) [   ]
> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> > params={wt=json&_=1489500479108=0} status=0 QTime=0
> > 2017-03-14 14:08:43.085 INFO  (qtp1543727556-2397) [c:email s:shard1
> > r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr
> path=/stream
> > params={indent=true=gatherNodes(email,+walk%3D"edwin@mail-
> > >from",+gather%3D"to")}
> > status=0 QTime=0
> > 2017-03-14 14:08:43.116 INFO  (qtp1543727556-8207) [c:email s:shard1
> > r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr
> path=/export
> > params={q={!terms+f%3Dfrom}ed...@mail.com=false&
> > fl=from,to=to+asc,from+asc=json=2.2}
> > hits=2471 status=0 QTime=19
> > 2017-03-14 14:08:43.163 ERROR (qtp1543727556-2397) [c:email s:shard1
> > r:core_node1 x:email] o.a.s.c.s.i.s.ExceptionStream
> > java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> > java.lang.RuntimeException: java.io.IOException:
> > java.util.concurrent.ExecutionException: java.io.IOException: -->
> > http://localhost:8983/solr/email/: An exception has occurred on the
> > server,
> > refer to server log for details.
> >
> > I am getting these logs from {solrHome}/server/log. Is this the correct
> > folder to get the log, or is there another folder which may contain the
> > error?
> >
> > Regards,
> > Edwin
> >
> >
> > On 14 March 2017 at 21:47, Joel Bernstein  wrote:
> >
> > > You're getting json parse errors, that look like your getting an XML
> > > response. Do you see any errors in the logs other then the stack
> trace. I
> > > suspect there might be another error above the stack trace which shows
> > the
> > > error from the server that causing it to respond with XML.
> > >
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Mon, Mar 13, 2017 at 11:01 PM, Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com>
> > > wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > >One thing it could be is that gatherNodes will only work on single
> > value
> > > > >fields currently.
> > > >
> > > > Regarding this, the fields which I am using in the query is already a
> > > > single value field, not multi-value field.
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 14 March 2017 at 10:04, Zheng Lin Edwin Yeo  >
> > > > wrote:
> > > >
> > > > > Hi Joel,
> > > > >
> > > > > This is the details which I get form the logs.
> > > > >
> > > > > java.lang.RuntimeException: java.util.concurrent.
> ExecutionException:
> > > > > java.lang.RuntimeException: java.io.IOException:
> > java.util.concurrent.
> > > > ExecutionException:
> > > > > java.io.IOException: --> http://localhost:8984/solr/email/: An
> > > exception
> > > > > has occurred on the server, refer to server log for details.
> > > > > at org.apache.solr.client.solrj.io.graph.GatherNodesStream.
> > > > > read(GatherNodesStream.java:600)
> > > > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > > > > read(ExceptionStream.java:68)
> > > > > at org.apache.solr.handler.StreamHandler$TimerStream.
> > > > > read(StreamHandler.java:479)
> > > > > at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> > > > > writeMap$0(TupleStream.java:67)
> > > > > at org.apache.solr.response.JSONWriter.writeIterator(
> > > > > JSONResponseWriter.java:523)
> > > > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > > > > TextResponseWriter.java:175)
> > > > > at org.apache.solr.response.JSONWriter$2.put(
> > > > JSONResponseWriter.java:559)
> > > > > at org.apache.solr.client.solrj.io.stream.TupleStream.
> > > > > writeMap(TupleStream.java:64)
> > > > > at org.apache.solr.response.JSONWriter.writeMap(
> > > > > JSONResponseWriter.java:547)
> > > > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > > > > TextResponseWriter.java:193)
> > > > > at org.apache.solr.response.JSONWriter.
> writeNamedListAsMapWithDups(
> > > > > JSONResponseWriter.java:209)
> > > > > at org.apache.solr.response.JSONWriter.writeNamedList(
> > > > > JSONResponseWriter.java:325)
> > > > > at org.apache.solr.response.JSONWriter.writeResponse(
> > > > > JSONResponseWriter.java:120)
> > > > > at org.apache.solr.response.JSONResponseWriter.write(
> > > > > JSONResponseWriter.java:71)
> > > > > at org.apache.solr.response.QueryResponseWriterUtil.
> > > writeQueryResponse(
> > > > > 

Re: Iterating sorted result docs in a custom search component

2017-03-14 Thread Erick Erickson
Then you're probably going to write your own Collector if you need to
see each document and do something different with it. Do be aware that
you _really_ need to be sure you get your values from docValues
fields. If you use the simple get(docId).getField() method the stored
fields will be read from disk and decompressed in order to be returned
which will be a performance killer. docValues fields _may_ be
automatic if you use useDocValuesAsStored=true for the fields you want
to examine.

WARNING: I haven't been into the code for useDocValuesAsStored so I'm
not at all clear whether it's really automatic at the collector level.

Best,
Erick

On Tue, Mar 14, 2017 at 1:15 AM, alexpusch  wrote:
> Single field. I'm iterating over the results once, and need each doc in
> memory only for that single iteration. I need different fields from each doc
> according to the algorithm state.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Iterating-sorted-result-docs-in-a-custom-search-component-tp4324497p4324818.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using fetch function with streaming expression

2017-03-14 Thread Joel Bernstein
Wow that's an interesting expression!

The problem is that you are trying to fetch using the ancestors field,
which is multi-valued. fetch doesn't support multi-value join keys. I never
thought someone might try to do that.

So , your attempting to get the concept names for ancestors?

Can you explain a little more about the use case?


Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 14, 2017 at 11:08 AM, Pratik Patel  wrote:

> I have two types of documents in my index. eventLink and concepttData.
>
> eventLink  { ancestors:[,] }
> conceptData-{ id:id1, conceptid, concept_name . }
>
> Both are in same collection.
> In my query, I am doing a gatherNodes query wrapped in some other function
> and ultimately I am getting a bunch of eventLink documents. Now, I am
> trying to get conceptData document for each id specified in eventLink's
> ancestors field. I am trying to do that using fetch() function. Here is
> simplified form of my query.
>
> fetch(collection1,
> >  function to get eventLinks,
> >   fl="concept_name",
> >   on="ancestors=conceptid"
> > )
>
>
> On executing this query, I am getting back same set of documents which are
> results of my streaming expression containing gatherNodes() function. No
> fields are added to the tuples. From documentation, it seems like fetch
> would fetch additional data and add it to the tuples. However, that is not
> happening. Resulting tuples does not have concept_name field in them. What
> am I missing here? I really need to get this additional data from one solr
> query so that I don't have to iterate over the eventLinks and get
> additional data by individual queries. That would badly impact performance.
> Any suggestions?
>
> Here is my actual query and the response.
>
>
> fetch(collection1,
> >  having(
> > gatherNodes(collection1,
> > search(collection1,q="*:*",fl="conceptid",sort="conceptid
> > asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:"
> Prospects2",
> > qt="/export"),
> > walk=conceptid->eventParticipantID,
> > gather="eventID",
> > trackTraversal="true", scatter="leaves",
> > count(*)
> > ),
> > gt(count(*),1)
> > ),
> > fl="concept_name",
> > on="ancestors=conceptid"
> > )
>
>
>
> Response :
>
> {
> > "result-set": {
> > "docs": [
> > {
> > "node": "524f03355056c8b53b4ed199",
> > "field": "eventID",
> > "level": 1,
> > "count(*)": 2,
> > "collection": "collection1",
> > "ancestors": [
> > "524f02845056c8b53b4e9871",
> > "524f02755056c8b53b4e9269"
> > ]
> > },
> > .
> > }
>
>
> Thanks,
> Pratik
>


Re: managing active/passive cores in Solr and Haystack

2017-03-14 Thread Erick Erickson
I don't know much about HAYSTACK, but for the Solr URL you probably
want the "shards" parameter for searching, see:
https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding

And just use the specific core you care about for update requests.

But I would suggest that you can have Solr do much of this work,
specifically SolrCloud with "implicit" routing. Combine that with
"collection aliasing" and I think you have what you need with a single
Solr URL. "implicit" routing allows you to send docs to a particular
shard based on the value of a particular field. You can add/remove
shards at will (only with the "implicit" router, not with the default
compositeID router". Etc.

I've skimmed over lots of details here, I just didn't wan you to be
unaware that a solution exists (see "time series data" in the
literature).

Best,
Erick

On Tue, Mar 14, 2017 at 8:06 AM, serwah sabetghadam
 wrote:
> Hi all,
>
> I am totally new to this group and of course so happy to join:)
> So my question may be repetitive but I did not find how to search all
> previous questions.
>
>
> problem in one sentence:
> to read from multiple cores (archive and active ones), write only to the
> latest active core
> using Solr and Haystack
>
>
> I am designing a periodic indexing system, one core per months, of which
> always the last two indexes are used to search on, and always the last one
> is the active one for current indexing.
>
>
> We are using Haystack to manage the communications with Solr.
> We can use multiple cores in the settings.py in Haystack, that is totally
> fine.
> The problem is that in this case, as I have tested, both cores are getting
> updated for new indexing.
>
> Then I decided to use the "--using" parameter of Haystack to select which
> backend to use for updating the index, sth like:
>
> ./manage.py update_index events.Event --age=24 --workers=4 --using=default
>
> that in default part in the settigns.py file I have defined the active
> core.
> HAYSTACK_CONNECTIONS = {
> 'default': {
> 'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
>  'URL': 'http://127.0.0.1:8983/solr/core_Feb',
>   },
>  'slave':{
>   'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
>   'URL': 'http://127.0.0.1:8983/solr/core_Jan',
>  },
>  }
>
> here core_Feb is the active core, or is going to be the active core.
>
> then now I am not sure this way it will read from both. Now I can manage
> the write part, but again problem with reading from multiple cores. What I
> tested before for reading from multiple cores was like :
>
> HAYSTACK_CONNECTIONS = {
> 'default': {
> 'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
>  'URL': 'http://127.0.0.1:8983/solr/core_Feb',
>   'URL': 'http://127.0.0.1:8983/solr/core_Jan',
>  },
>  }
>
>
> but in this case it will write in both! that I want to write only in the
> core_Feb one.
>
> Any help is highly appreciated,
> Best,
> Serwah


Re: Suggestions from different dictionaries dynamically

2017-03-14 Thread Alexandre Rafalovitch
Are you actually using spell checker functionality? If so, could you
provide the solrconfig.xml segment of what that configuration looks
like.

Or are you just using plain search, then what is your default 'df' field?

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 14 March 2017 at 08:24, vuppalasubbarao  wrote:
> Hi,
>
> We have two field names "teacher_field" and "school_field" along with other
> fields like "source". We have created single dictionary from both these
> fields.  When I am searching with misspelling of "teacher_field", I also get
> the spelling suggestions from "school_field". Instead I have to get
> suggestions only from schoolfield alone if I can pass additional filter
> field like "source".
>
> In this case, is there a possibility I can create two dictionaries and use
> one of them at query time based on my other query field[fq=source:"TEACHER"
> or fq=source:"COLLEGE" ].
>
> */SAMPLE DOCS:/*
> 
> 1
> michael
> newyork
> teacher
> 
> 
> 2
> rajan
> michigan
> school
> 
>
> /*QUERY:*/
> q=michial=source:teacher   ---> getting suggestion as "michigan". Instead
> I expect "michael"
>
> /*SCHEMA.XML*/
>  
>  
>
>
>
>
>
> Thanks,
> Subbarao
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Suggestions-from-different-dictionaries-dynamically-tp4324864.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Error with Streaming Expressions - shortestPath

2017-03-14 Thread Joel Bernstein
Ok. I updated the other thread with a URL to run based on what I've seeing
in the logs. Try running that URL and let's see what comes back.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 14, 2017 at 10:26 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi Joel,
>
> This is a standard Solr 6.4.1 install. I got the same error even after I
> upgrade it to Solr 6.4.2.
>
> Regards,
> Edwin
>
>
> On 14 March 2017 at 21:30, Joel Bernstein  wrote:
>
> > Looks like there might be something strange with your configuration. Did
> > you upgrade an existing install or is this a standard Solr 6.4.1 install?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Mar 14, 2017 at 6:22 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I tired to run the following Streaming query with the shortestPath
> Stream
> > > Source.
> > >
> > > http://localhost:8983/solr/email/stream?expr=shortestPath(email,
> > >   from="ed...@mail.com",
> > > to="ad...@mail.com",
> > >   edge="from_address=to_address",
> > >   threads="6",
> > >   partitionSize="300",
> > >   maxDepth="4")=true
> > >
> > > I get the following error in the output:
> > >
> > > {
> > >   "result-set":{
> > > "docs":[{
> > > "EXCEPTION":"java.util.concurrent.ExecutionException:
> > > java.lang.RuntimeException: java.io.IOException:
> > > java.util.concurrent.ExecutionException: java.io.IOException: -->
> > > http://localhost:8983/solr/email/: An exception has occurred on the
> > > server,
> > > refer to server log for details.",
> > > "EOF":true,
> > > "RESPONSE_TIME":112}]}}
> > >
> > >
> > > Here is the logs from the error log:
> > >
> > > java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> > > java.lang.RuntimeException: java.io.IOException:
> > > java.util.concurrent.ExecutionException: java.io.IOException: -->
> > > http://localhost:8983/solr/email/: An exception has occurred on the
> > > server,
> > > refer to server log for details.
> > > at
> > > org.apache.solr.client.solrj.io.graph.ShortestPathStream.
> > > open(ShortestPathStream.java:385)
> > > at
> > > org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > > open(ExceptionStream.java:51)
> > > at
> > > org.apache.solr.handler.StreamHandler$TimerStream.
> > > open(StreamHandler.java:457)
> > > at
> > > org.apache.solr.client.solrj.io.stream.TupleStream.
> > > writeMap(TupleStream.java:63)
> > > at org.apache.solr.response.JSONWriter.writeMap(
> > > JSONResponseWriter.java:547)
> > > at
> > > org.apache.solr.response.TextResponseWriter.writeVal(
> > > TextResponseWriter.java:193)
> > > at
> > > org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > > JSONResponseWriter.java:209)
> > > at
> > > org.apache.solr.response.JSONWriter.writeNamedList(
> > > JSONResponseWriter.java:325)
> > > at
> > > org.apache.solr.response.JSONWriter.writeResponse(
> > > JSONResponseWriter.java:120)
> > > at
> > > org.apache.solr.response.JSONResponseWriter.write(
> > > JSONResponseWriter.java:71)
> > > at
> > > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> > > QueryResponseWriterUtil.java:65)
> > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > > HttpSolrCall.java:732)
> > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> > > at
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > SolrDispatchFilter.java:345)
> > > at
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > SolrDispatchFilter.java:296)
> > > at
> > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > > doFilter(ServletHandler.java:1691)
> > > at
> > > org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > ServletHandler.java:582)
> > > at
> > > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > > ScopedHandler.java:143)
> > > at
> > > org.eclipse.jetty.security.SecurityHandler.handle(
> > > SecurityHandler.java:548)
> > > at
> > > org.eclipse.jetty.server.session.SessionHandler.
> > > doHandle(SessionHandler.java:226)
> > > at
> > > org.eclipse.jetty.server.handler.ContextHandler.
> > > doHandle(ContextHandler.java:1180)
> > > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> > > ServletHandler.java:512)
> > > at
> > > org.eclipse.jetty.server.session.SessionHandler.
> > > doScope(SessionHandler.java:185)
> > > at
> > > org.eclipse.jetty.server.handler.ContextHandler.
> > > doScope(ContextHandler.java:1112)
> > > at
> > > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > > ScopedHandler.java:141)
> > > at
> > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > > ContextHandlerCollection.java:213)
> > > at
> > > org.eclipse.jetty.server.handler.HandlerCollection.
> > > handle(HandlerCollection.java:119)
> > > at
> > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > > HandlerWrapper.java:134)
> > > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> 

Using fetch function with streaming expression

2017-03-14 Thread Pratik Patel
I have two types of documents in my index. eventLink and concepttData.

eventLink  { ancestors:[,] }
conceptData-{ id:id1, conceptid, concept_name . }

Both are in same collection.
In my query, I am doing a gatherNodes query wrapped in some other function
and ultimately I am getting a bunch of eventLink documents. Now, I am
trying to get conceptData document for each id specified in eventLink's
ancestors field. I am trying to do that using fetch() function. Here is
simplified form of my query.

fetch(collection1,
>  function to get eventLinks,
>   fl="concept_name",
>   on="ancestors=conceptid"
> )


On executing this query, I am getting back same set of documents which are
results of my streaming expression containing gatherNodes() function. No
fields are added to the tuples. From documentation, it seems like fetch
would fetch additional data and add it to the tuples. However, that is not
happening. Resulting tuples does not have concept_name field in them. What
am I missing here? I really need to get this additional data from one solr
query so that I don't have to iterate over the eventLinks and get
additional data by individual queries. That would badly impact performance.
Any suggestions?

Here is my actual query and the response.


fetch(collection1,
>  having(
> gatherNodes(collection1,
> search(collection1,q="*:*",fl="conceptid",sort="conceptid
> asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Company",fq=tags:"Prospects2",
> qt="/export"),
> walk=conceptid->eventParticipantID,
> gather="eventID",
> trackTraversal="true", scatter="leaves",
> count(*)
> ),
> gt(count(*),1)
> ),
> fl="concept_name",
> on="ancestors=conceptid"
> )



Response :

{
> "result-set": {
> "docs": [
> {
> "node": "524f03355056c8b53b4ed199",
> "field": "eventID",
> "level": 1,
> "count(*)": 2,
> "collection": "collection1",
> "ancestors": [
> "524f02845056c8b53b4e9871",
> "524f02755056c8b53b4e9269"
> ]
> },
> .
> }


Thanks,
Pratik


managing active/passive cores in Solr and Haystack

2017-03-14 Thread serwah sabetghadam
Hi all,

I am totally new to this group and of course so happy to join:)
So my question may be repetitive but I did not find how to search all
previous questions.


problem in one sentence:
to read from multiple cores (archive and active ones), write only to the
latest active core
using Solr and Haystack


I am designing a periodic indexing system, one core per months, of which
always the last two indexes are used to search on, and always the last one
is the active one for current indexing.


We are using Haystack to manage the communications with Solr.
We can use multiple cores in the settings.py in Haystack, that is totally
fine.
The problem is that in this case, as I have tested, both cores are getting
updated for new indexing.

Then I decided to use the "--using" parameter of Haystack to select which
backend to use for updating the index, sth like:

./manage.py update_index events.Event --age=24 --workers=4 --using=default

that in default part in the settigns.py file I have defined the active
core.
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
 'URL': 'http://127.0.0.1:8983/solr/core_Feb',
  },
 'slave':{
  'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
  'URL': 'http://127.0.0.1:8983/solr/core_Jan',
 },
 }

here core_Feb is the active core, or is going to be the active core.

then now I am not sure this way it will read from both. Now I can manage
the write part, but again problem with reading from multiple cores. What I
tested before for reading from multiple cores was like :

HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
 'URL': 'http://127.0.0.1:8983/solr/core_Feb',
  'URL': 'http://127.0.0.1:8983/solr/core_Jan',
 },
 }


but in this case it will write in both! that I want to write only in the
core_Feb one.

Any help is highly appreciated,
Best,
Serwah


Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
Thanks Toke,

After sorting with Self Time(CPU) I got that the
FSDirectory$FSIndexOutput$1.write() is taking much of CPU time, so the
bottleneck now is the IO of the hard drive?

https://drive.google.com/open?id=0BwLcshoSCVcdb2I4U1RBNnI0OVU

On Tue, Mar 14, 2017 at 4:19 PM, Toke Eskildsen  wrote:

> On Tue, 2017-03-14 at 11:51 +0200, Mahmoud Almokadem wrote:
> > Here is the profiler screenshot from VisualVM after upgrading
> >
> > https://drive.google.com/open?id=0BwLcshoSCVcddldVRTExaDR2dzg
> >
> > the jetty is taking the most time on CPU. Does this mean, the jetty
> > is the bottleneck on indexing?
>
> You need to sort on and look at the "Self Time (CPU)" column in
> VisualVM, not the default "Self Time", to see where the power is used.
> The default is pretty useless for locating hot spots.
>
> - Toke Eskildsen, Royal Danish Library
>


Re: Need help with date boost

2017-03-14 Thread Rick Leir
Hi Erick
We have ten year old documents and new ones which often score about the same 
just based on default similarity. But the newer ones are much more relevant in 
our case.

Suppose we de-boost proportionally to (NOW/YEAR - modifiedDate/YEAR). Thanks 
for the link you provided, it had not jumped out at me. Is the syntax up to 
date (no pun!)? 

Walter, how do you calculate this logarithmically?
Thanks, Rick

On March 14, 2017 1:21:52 AM EDT, Erick Erickson  
wrote:
>first I think the requirement is a bad one. Why should a document with
>low relevance 29 days ago score higher than the perfect document from
>31 days ago? That doesn't seem like it serves the user very well...
>
>And then "However in cases where update date is unavailable I need to
>sort it using created date." Does that mean if a doc has no update
>date but does have a created date 10 days ago you want to sort it
>before any docs older than 30 days? If so the simplest bit here would
>be to insure that any incoming doc has an update date by copying the
>created date into the update field if there is no update field.
>
>Now, all that aside, you sort by date with function queries. And
>function queries can have if clauses. So your sort function will be a
>big long set of if statements giving a boost to docs in your ranges.
>Or you can make your own function query plugin that would undoubtedly
>be more efficient. See the ref guide for "function queries".
>
>But again, I think this is a case where if you present the people who
>came up with this requirement with an explanation of the effects, you
>may just be able to sort by giving weight to more recent documents and
>be done with it. See:
>https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
>
>Best,
>Erick
>
>On Mon, Mar 13, 2017 at 7:17 PM, Atita Arora 
>wrote:
>> Hi all,
>>
>> I am trying to resolve a problem here where I have to fiddle around
>with
>> set of dates ( created and updated date).
>> My use is that I have to make sure that the document with latest
>(recent)
>>  update date should come higher in my search results.
>> Precisely,  I am required to maintain 3 buckets wherein documents
>with
>> updated date falling in range of last 30 days should have maximum
>weight,
>> followed by update date in 60 and 90 and the rest.
>> However in cases where update date is unavailable I need to sort it
>using
>> created date.
>> I am not sure how do I achieve this.
>> Any insights here would be a great help.
>> Thanks in advance.
>> Regards,
>> Atita

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
try running the following query:

http://localhost:8983/solr/email/export?{!terms+f%3Dfrom}ed...@mail.com
=false=from,to=to+asc,from+asc=json=2.2

Let's see what comes back from this.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 14, 2017 at 10:20 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi Joel,
>
> I have only managed to find these above the stack trace.
>
> 2017-03-14 14:08:42.819 INFO  (qtp1543727556-2635) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1489500479108=0} status=0 QTime=0
> 2017-03-14 14:08:43.085 INFO  (qtp1543727556-2397) [c:email s:shard1
> r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr path=/stream
> params={indent=true=gatherNodes(email,+walk%3D"edwin@mail-
> >from",+gather%3D"to")}
> status=0 QTime=0
> 2017-03-14 14:08:43.116 INFO  (qtp1543727556-8207) [c:email s:shard1
> r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr path=/export
> params={q={!terms+f%3Dfrom}ed...@mail.com=false&
> fl=from,to=to+asc,from+asc=json=2.2}
> hits=2471 status=0 QTime=19
> 2017-03-14 14:08:43.163 ERROR (qtp1543727556-2397) [c:email s:shard1
> r:core_node1 x:email] o.a.s.c.s.i.s.ExceptionStream
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: java.io.IOException:
> java.util.concurrent.ExecutionException: java.io.IOException: -->
> http://localhost:8983/solr/email/: An exception has occurred on the
> server,
> refer to server log for details.
>
> I am getting these logs from {solrHome}/server/log. Is this the correct
> folder to get the log, or is there another folder which may contain the
> error?
>
> Regards,
> Edwin
>
>
> On 14 March 2017 at 21:47, Joel Bernstein  wrote:
>
> > You're getting json parse errors, that look like your getting an XML
> > response. Do you see any errors in the logs other then the stack trace. I
> > suspect there might be another error above the stack trace which shows
> the
> > error from the server that causing it to respond with XML.
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, Mar 13, 2017 at 11:01 PM, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> > wrote:
> >
> > > Hi Joel,
> > >
> > > >One thing it could be is that gatherNodes will only work on single
> value
> > > >fields currently.
> > >
> > > Regarding this, the fields which I am using in the query is already a
> > > single value field, not multi-value field.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 14 March 2017 at 10:04, Zheng Lin Edwin Yeo 
> > > wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > This is the details which I get form the logs.
> > > >
> > > > java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> > > > java.lang.RuntimeException: java.io.IOException:
> java.util.concurrent.
> > > ExecutionException:
> > > > java.io.IOException: --> http://localhost:8984/solr/email/: An
> > exception
> > > > has occurred on the server, refer to server log for details.
> > > > at org.apache.solr.client.solrj.io.graph.GatherNodesStream.
> > > > read(GatherNodesStream.java:600)
> > > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > > > read(ExceptionStream.java:68)
> > > > at org.apache.solr.handler.StreamHandler$TimerStream.
> > > > read(StreamHandler.java:479)
> > > > at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> > > > writeMap$0(TupleStream.java:67)
> > > > at org.apache.solr.response.JSONWriter.writeIterator(
> > > > JSONResponseWriter.java:523)
> > > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > > > TextResponseWriter.java:175)
> > > > at org.apache.solr.response.JSONWriter$2.put(
> > > JSONResponseWriter.java:559)
> > > > at org.apache.solr.client.solrj.io.stream.TupleStream.
> > > > writeMap(TupleStream.java:64)
> > > > at org.apache.solr.response.JSONWriter.writeMap(
> > > > JSONResponseWriter.java:547)
> > > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > > > TextResponseWriter.java:193)
> > > > at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > > > JSONResponseWriter.java:209)
> > > > at org.apache.solr.response.JSONWriter.writeNamedList(
> > > > JSONResponseWriter.java:325)
> > > > at org.apache.solr.response.JSONWriter.writeResponse(
> > > > JSONResponseWriter.java:120)
> > > > at org.apache.solr.response.JSONResponseWriter.write(
> > > > JSONResponseWriter.java:71)
> > > > at org.apache.solr.response.QueryResponseWriterUtil.
> > writeQueryResponse(
> > > > QueryResponseWriterUtil.java:65)
> > > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > > > HttpSolrCall.java:732)
> > > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> > > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > > SolrDispatchFilter.java:345)
> > > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > > SolrDispatchFilter.java:296)
> > > > 

Re: Error with Streaming Expressions - shortestPath

2017-03-14 Thread Zheng Lin Edwin Yeo
Hi Joel,

This is a standard Solr 6.4.1 install. I got the same error even after I
upgrade it to Solr 6.4.2.

Regards,
Edwin


On 14 March 2017 at 21:30, Joel Bernstein  wrote:

> Looks like there might be something strange with your configuration. Did
> you upgrade an existing install or is this a standard Solr 6.4.1 install?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Mar 14, 2017 at 6:22 AM, Zheng Lin Edwin Yeo  >
> wrote:
>
> > Hi,
> >
> > I tired to run the following Streaming query with the shortestPath Stream
> > Source.
> >
> > http://localhost:8983/solr/email/stream?expr=shortestPath(email,
> >   from="ed...@mail.com",
> > to="ad...@mail.com",
> >   edge="from_address=to_address",
> >   threads="6",
> >   partitionSize="300",
> >   maxDepth="4")=true
> >
> > I get the following error in the output:
> >
> > {
> >   "result-set":{
> > "docs":[{
> > "EXCEPTION":"java.util.concurrent.ExecutionException:
> > java.lang.RuntimeException: java.io.IOException:
> > java.util.concurrent.ExecutionException: java.io.IOException: -->
> > http://localhost:8983/solr/email/: An exception has occurred on the
> > server,
> > refer to server log for details.",
> > "EOF":true,
> > "RESPONSE_TIME":112}]}}
> >
> >
> > Here is the logs from the error log:
> >
> > java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> > java.lang.RuntimeException: java.io.IOException:
> > java.util.concurrent.ExecutionException: java.io.IOException: -->
> > http://localhost:8983/solr/email/: An exception has occurred on the
> > server,
> > refer to server log for details.
> > at
> > org.apache.solr.client.solrj.io.graph.ShortestPathStream.
> > open(ShortestPathStream.java:385)
> > at
> > org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > open(ExceptionStream.java:51)
> > at
> > org.apache.solr.handler.StreamHandler$TimerStream.
> > open(StreamHandler.java:457)
> > at
> > org.apache.solr.client.solrj.io.stream.TupleStream.
> > writeMap(TupleStream.java:63)
> > at org.apache.solr.response.JSONWriter.writeMap(
> > JSONResponseWriter.java:547)
> > at
> > org.apache.solr.response.TextResponseWriter.writeVal(
> > TextResponseWriter.java:193)
> > at
> > org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > JSONResponseWriter.java:209)
> > at
> > org.apache.solr.response.JSONWriter.writeNamedList(
> > JSONResponseWriter.java:325)
> > at
> > org.apache.solr.response.JSONWriter.writeResponse(
> > JSONResponseWriter.java:120)
> > at
> > org.apache.solr.response.JSONResponseWriter.write(
> > JSONResponseWriter.java:71)
> > at
> > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> > QueryResponseWriterUtil.java:65)
> > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > HttpSolrCall.java:732)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:345)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:296)
> > at
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > doFilter(ServletHandler.java:1691)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:143)
> > at
> > org.eclipse.jetty.security.SecurityHandler.handle(
> > SecurityHandler.java:548)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> > doHandle(SessionHandler.java:226)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> > doHandle(ContextHandler.java:1180)
> > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> > ServletHandler.java:512)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> > doScope(SessionHandler.java:185)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> > doScope(ContextHandler.java:1112)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:141)
> > at
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > ContextHandlerCollection.java:213)
> > at
> > org.eclipse.jetty.server.handler.HandlerCollection.
> > handle(HandlerCollection.java:119)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > HandlerWrapper.java:134)
> > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> > at
> > org.eclipse.jetty.server.HttpConnection.onFillable(
> > HttpConnection.java:251)
> > at
> > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> > AbstractConnection.java:273)
> > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> > at
> > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> > SelectChannelEndPoint.java:93)
> > at
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > 

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Zheng Lin Edwin Yeo
Hi Joel,

I have only managed to find these above the stack trace.

2017-03-14 14:08:42.819 INFO  (qtp1543727556-2635) [   ]
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
params={wt=json&_=1489500479108=0} status=0 QTime=0
2017-03-14 14:08:43.085 INFO  (qtp1543727556-2397) [c:email s:shard1
r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr path=/stream
params={indent=true=gatherNodes(email,+walk%3D"edwin@mail->from",+gather%3D"to")}
status=0 QTime=0
2017-03-14 14:08:43.116 INFO  (qtp1543727556-8207) [c:email s:shard1
r:core_node1 x:email] o.a.s.c.S.Request [email]  webapp=/solr path=/export
params={q={!terms+f%3Dfrom}ed...@mail.com=false=from,to=to+asc,from+asc=json=2.2}
hits=2471 status=0 QTime=19
2017-03-14 14:08:43.163 ERROR (qtp1543727556-2397) [c:email s:shard1
r:core_node1 x:email] o.a.s.c.s.i.s.ExceptionStream
java.lang.RuntimeException: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: java.io.IOException:
java.util.concurrent.ExecutionException: java.io.IOException: -->
http://localhost:8983/solr/email/: An exception has occurred on the server,
refer to server log for details.

I am getting these logs from {solrHome}/server/log. Is this the correct
folder to get the log, or is there another folder which may contain the
error?

Regards,
Edwin


On 14 March 2017 at 21:47, Joel Bernstein  wrote:

> You're getting json parse errors, that look like your getting an XML
> response. Do you see any errors in the logs other then the stack trace. I
> suspect there might be another error above the stack trace which shows the
> error from the server that causing it to respond with XML.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Mar 13, 2017 at 11:01 PM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> wrote:
>
> > Hi Joel,
> >
> > >One thing it could be is that gatherNodes will only work on single value
> > >fields currently.
> >
> > Regarding this, the fields which I am using in the query is already a
> > single value field, not multi-value field.
> >
> > Regards,
> > Edwin
> >
> >
> > On 14 March 2017 at 10:04, Zheng Lin Edwin Yeo 
> > wrote:
> >
> > > Hi Joel,
> > >
> > > This is the details which I get form the logs.
> > >
> > > java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> > > java.lang.RuntimeException: java.io.IOException: java.util.concurrent.
> > ExecutionException:
> > > java.io.IOException: --> http://localhost:8984/solr/email/: An
> exception
> > > has occurred on the server, refer to server log for details.
> > > at org.apache.solr.client.solrj.io.graph.GatherNodesStream.
> > > read(GatherNodesStream.java:600)
> > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > > read(ExceptionStream.java:68)
> > > at org.apache.solr.handler.StreamHandler$TimerStream.
> > > read(StreamHandler.java:479)
> > > at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> > > writeMap$0(TupleStream.java:67)
> > > at org.apache.solr.response.JSONWriter.writeIterator(
> > > JSONResponseWriter.java:523)
> > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > > TextResponseWriter.java:175)
> > > at org.apache.solr.response.JSONWriter$2.put(
> > JSONResponseWriter.java:559)
> > > at org.apache.solr.client.solrj.io.stream.TupleStream.
> > > writeMap(TupleStream.java:64)
> > > at org.apache.solr.response.JSONWriter.writeMap(
> > > JSONResponseWriter.java:547)
> > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > > TextResponseWriter.java:193)
> > > at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > > JSONResponseWriter.java:209)
> > > at org.apache.solr.response.JSONWriter.writeNamedList(
> > > JSONResponseWriter.java:325)
> > > at org.apache.solr.response.JSONWriter.writeResponse(
> > > JSONResponseWriter.java:120)
> > > at org.apache.solr.response.JSONResponseWriter.write(
> > > JSONResponseWriter.java:71)
> > > at org.apache.solr.response.QueryResponseWriterUtil.
> writeQueryResponse(
> > > QueryResponseWriterUtil.java:65)
> > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > > HttpSolrCall.java:732)
> > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > SolrDispatchFilter.java:345)
> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > SolrDispatchFilter.java:296)
> > > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > > doFilter(ServletHandler.java:1691)
> > > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > > ServletHandler.java:582)
> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > > ScopedHandler.java:143)
> > > at org.eclipse.jetty.security.SecurityHandler.handle(
> > > SecurityHandler.java:548)
> > > at org.eclipse.jetty.server.session.SessionHandler.
> > > doHandle(SessionHandler.java:226)
> > > at org.eclipse.jetty.server.handler.ContextHandler.
> > > 

SOLR Data Locality

2017-03-14 Thread Muhammad Imad Qureshi
We have a 30 node Hadoop cluster and each data node has a SOLR instance also 
running. Data is stored in HDFS. We are adding 10 nodes to the cluster. After 
adding nodes, we'll run HDFS balancer and also create SOLR replicas on new 
nodes. This will affect data locality. does this impact how solr works (I mean 
performance) if the data is on a remote node? ThanksImad


Re: Indexing CPU performance

2017-03-14 Thread Toke Eskildsen
On Tue, 2017-03-14 at 11:51 +0200, Mahmoud Almokadem wrote:
> Here is the profiler screenshot from VisualVM after upgrading
> 
> https://drive.google.com/open?id=0BwLcshoSCVcddldVRTExaDR2dzg
> 
> the jetty is taking the most time on CPU. Does this mean, the jetty
> is the bottleneck on indexing?

You need to sort on and look at the "Self Time (CPU)" column in
VisualVM, not the default "Self Time", to see where the power is used. 
The default is pretty useless for locating hot spots.

- Toke Eskildsen, Royal Danish Library


Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-14 Thread Shawn Heisey
On 3/14/2017 3:08 AM, Gerald Reinhart wrote:
> Hi,
>The custom code we have is something like this :
> public class MySearchHandlerextends SearchHandler {
> @Override public void handleRequestBody(SolrQueryRequest req,
> SolrQueryResponse rsp)throws Exception {
> SolrIndexSearcher  searcher =req.getSearcher();
> try{
>  // Do stuff with the searcher
> }finally {
> req.close();
> }

>  Despite the fact that we always close the request each time we get
> a SolrIndexSearcher from the request, the number of SolrIndexSearcher
> instances is increasing. Each time a new commit is done on the index, a
> new Searcher is created (this is normal) but the old one remains. Is
> there something wrong with this custom code ?

My understanding of Solr and Lucene internals is rudimentary, but I
might know what's happening here.

The code closes the request, but never closes the searcher.  Searcher
objects include a Lucene object that holds onto the index files that
pertain to that view of the index.  The searcher must be closed.

It does look like if you close the searcher and then close the request,
that might be enough to fully decrement all the reference counters
involved, but I do not know the code well enough to be sure of that.

Thanks,
Shawn



Re: Error with Streaming Expressions - shortestPath

2017-03-14 Thread Joel Bernstein
Looks like there might be something strange with your configuration. Did
you upgrade an existing install or is this a standard Solr 6.4.1 install?

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 14, 2017 at 6:22 AM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I tired to run the following Streaming query with the shortestPath Stream
> Source.
>
> http://localhost:8983/solr/email/stream?expr=shortestPath(email,
>   from="ed...@mail.com",
> to="ad...@mail.com",
>   edge="from_address=to_address",
>   threads="6",
>   partitionSize="300",
>   maxDepth="4")=true
>
> I get the following error in the output:
>
> {
>   "result-set":{
> "docs":[{
> "EXCEPTION":"java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: java.io.IOException:
> java.util.concurrent.ExecutionException: java.io.IOException: -->
> http://localhost:8983/solr/email/: An exception has occurred on the
> server,
> refer to server log for details.",
> "EOF":true,
> "RESPONSE_TIME":112}]}}
>
>
> Here is the logs from the error log:
>
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: java.io.IOException:
> java.util.concurrent.ExecutionException: java.io.IOException: -->
> http://localhost:8983/solr/email/: An exception has occurred on the
> server,
> refer to server log for details.
> at
> org.apache.solr.client.solrj.io.graph.ShortestPathStream.
> open(ShortestPathStream.java:385)
> at
> org.apache.solr.client.solrj.io.stream.ExceptionStream.
> open(ExceptionStream.java:51)
> at
> org.apache.solr.handler.StreamHandler$TimerStream.
> open(StreamHandler.java:457)
> at
> org.apache.solr.client.solrj.io.stream.TupleStream.
> writeMap(TupleStream.java:63)
> at org.apache.solr.response.JSONWriter.writeMap(
> JSONResponseWriter.java:547)
> at
> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWriter.java:193)
> at
> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> JSONResponseWriter.java:209)
> at
> org.apache.solr.response.JSONWriter.writeNamedList(
> JSONResponseWriter.java:325)
> at
> org.apache.solr.response.JSONWriter.writeResponse(
> JSONResponseWriter.java:120)
> at
> org.apache.solr.response.JSONResponseWriter.write(
> JSONResponseWriter.java:71)
> at
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> QueryResponseWriterUtil.java:65)
> at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrCall.java:732)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:345)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:296)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> executeProduceConsume(ExecuteProduceConsume.java:303)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceConsume(ExecuteProduceConsume.java:148)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:136)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> 

Re: Indexing CPU performance

2017-03-14 Thread Shawn Heisey
On 3/14/2017 3:35 AM, Mahmoud Almokadem wrote:
> After upgrading to 6.4.2 I got 3500+ docs/sec throughput with two uploading
> clients to solr which is good to me for the whole reindexing.
>
> I'll try Shawn code to posting to solr using HttpSolrClient instead of
> SolrCloudClient.

If the servers are running in SolrCloud mode, then you want
CloudSolrClient.  You do not want HttpSolrClient.  The servers that my
code talks to are not in Cloud mode, so I use the Http version.

If you want to be able to use multiple threads for indexing, you will
still need to create a custom HttpClient -- that's the point of the code
snippet I shared.  Replace "HttpSolrClient" with "CloudSolrClient" and
"serverBaseUrl" with "zkHost" and it would work for cloud mode.

Thanks,
Shawn



Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
You're getting json parse errors, that look like your getting an XML
response. Do you see any errors in the logs other then the stack trace. I
suspect there might be another error above the stack trace which shows the
error from the server that causing it to respond with XML.



Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Mar 13, 2017 at 11:01 PM, Zheng Lin Edwin Yeo 
wrote:

> Hi Joel,
>
> >One thing it could be is that gatherNodes will only work on single value
> >fields currently.
>
> Regarding this, the fields which I am using in the query is already a
> single value field, not multi-value field.
>
> Regards,
> Edwin
>
>
> On 14 March 2017 at 10:04, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi Joel,
> >
> > This is the details which I get form the logs.
> >
> > java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> > java.lang.RuntimeException: java.io.IOException: java.util.concurrent.
> ExecutionException:
> > java.io.IOException: --> http://localhost:8984/solr/email/: An exception
> > has occurred on the server, refer to server log for details.
> > at org.apache.solr.client.solrj.io.graph.GatherNodesStream.
> > read(GatherNodesStream.java:600)
> > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > read(ExceptionStream.java:68)
> > at org.apache.solr.handler.StreamHandler$TimerStream.
> > read(StreamHandler.java:479)
> > at org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> > writeMap$0(TupleStream.java:67)
> > at org.apache.solr.response.JSONWriter.writeIterator(
> > JSONResponseWriter.java:523)
> > at org.apache.solr.response.TextResponseWriter.writeVal(
> > TextResponseWriter.java:175)
> > at org.apache.solr.response.JSONWriter$2.put(
> JSONResponseWriter.java:559)
> > at org.apache.solr.client.solrj.io.stream.TupleStream.
> > writeMap(TupleStream.java:64)
> > at org.apache.solr.response.JSONWriter.writeMap(
> > JSONResponseWriter.java:547)
> > at org.apache.solr.response.TextResponseWriter.writeVal(
> > TextResponseWriter.java:193)
> > at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > JSONResponseWriter.java:209)
> > at org.apache.solr.response.JSONWriter.writeNamedList(
> > JSONResponseWriter.java:325)
> > at org.apache.solr.response.JSONWriter.writeResponse(
> > JSONResponseWriter.java:120)
> > at org.apache.solr.response.JSONResponseWriter.write(
> > JSONResponseWriter.java:71)
> > at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> > QueryResponseWriterUtil.java:65)
> > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > HttpSolrCall.java:732)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:345)
> > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:296)
> > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > doFilter(ServletHandler.java:1691)
> > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > ServletHandler.java:582)
> > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:143)
> > at org.eclipse.jetty.security.SecurityHandler.handle(
> > SecurityHandler.java:548)
> > at org.eclipse.jetty.server.session.SessionHandler.
> > doHandle(SessionHandler.java:226)
> > at org.eclipse.jetty.server.handler.ContextHandler.
> > doHandle(ContextHandler.java:1180)
> > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> > ServletHandler.java:512)
> > at org.eclipse.jetty.server.session.SessionHandler.
> > doScope(SessionHandler.java:185)
> > at org.eclipse.jetty.server.handler.ContextHandler.
> > doScope(ContextHandler.java:1112)
> > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:141)
> > at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > ContextHandlerCollection.java:213)
> > at org.eclipse.jetty.server.handler.HandlerCollection.
> > handle(HandlerCollection.java:119)
> > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > HandlerWrapper.java:134)
> > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> > at org.eclipse.jetty.server.HttpConnection.onFillable(
> > HttpConnection.java:251)
> > at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> > AbstractConnection.java:273)
> > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> > SelectChannelEndPoint.java:93)
> > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > executeProduceConsume(ExecuteProduceConsume.java:303)
> > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > produceConsume(ExecuteProduceConsume.java:148)
> > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> > ExecuteProduceConsume.java:136)
> > at 

Suggestions from different dictionaries dynamically

2017-03-14 Thread vuppalasubbarao
Hi,

We have two field names "teacher_field" and "school_field" along with other
fields like "source". We have created single dictionary from both these
fields.  When I am searching with misspelling of "teacher_field", I also get
the spelling suggestions from "school_field". Instead I have to get
suggestions only from schoolfield alone if I can pass additional filter
field like "source".  

In this case, is there a possibility I can create two dictionaries and use
one of them at query time based on my other query field[fq=source:"TEACHER"
or fq=source:"COLLEGE" ]. 

*/SAMPLE DOCS:/*

1 
michael
newyork
teacher


2 
rajan
michigan
school


/*QUERY:*/
q=michial=source:teacher   ---> getting suggestion as "michigan". Instead
I expect "michael"

/*SCHEMA.XML*/
 
 





Thanks,
Subbarao



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggestions-from-different-dictionaries-dynamically-tp4324864.html
Sent from the Solr - User mailing list archive at Nabble.com.


SortingMergePolicy in solr 6.4.2

2017-03-14 Thread Sahil Agarwal
The SortingMergePolicy does not seem to get implemeted.

The csv file gets indexed without errors. But when I search for a term, the
results returned are not sorted by Marks.

Following is a toy project in Solr 6.4.2 on which I tried to use
SortingMergePolicyFactory.

Just showing the changes that I did in the core's config files. Please tell
me if any other info is needed.
I used the basic_configs when creating core:
create_core -c corename -d basic_configs


managed-schema


.
.
.
  


solrconfig.xml



Marks descinner
  org.apache.solr.index.TieredMergePolicyFactory 
​

​

1.csv

id,Name,Subject,Marks 1,Sahil Agarwal,Computers,1108 2,Ian
Roberts,Maths,7077 3,Karan Vatsa,English,6092 4,Amit Williams,Maths,3924
5,Vani Agarwal,Computers,4263 6,Brenda Gupta,Computers,2309
.
.
​30 rows​

What can be the problem??


Re: Modifying solrconfig.xml in solr cloud

2017-03-14 Thread Binoy Dalal
Thanks Eric. Missed that somehow.

On Tue, 14 Mar 2017, 10:44 Erick Erickson,  wrote:

> First hit from googling "solr config API"
>
> https://cwiki.apache.org/confluence/display/solr/Config+API
>
> Best,
> Erick
>
> On Mon, Mar 13, 2017 at 8:27 PM, Binoy Dalal 
> wrote:
> > Is there a simpler way of modifying solrconfig.xml in cloud mode without
> > having to download the file from zookeeper, modifying it and reuploading
> it?
> >
> > Something like the schema API maybe?
> > --
> > Regards,
> > Binoy Dalal
>
-- 
Regards,
Binoy Dalal


Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
After upgrading to 6.4.2 I got 3500+ docs/sec throughput with two uploading
clients to solr which is good to me for the whole reindexing.

I'll try Shawn code to posting to solr using HttpSolrClient instead of
SolrCloudClient.

Thanks to all,
Mahmoud

On Tue, Mar 14, 2017 at 10:23 AM, Mahmoud Almokadem 
wrote:

>
> I'm using VisualVM and sematext to monitor my cluster.
>
> Below is screenshots for each of them.
>
> https://drive.google.com/open?id=0BwLcshoSCVcdWHRJeUNyekxWN28
>
> https://drive.google.com/open?id=0BwLcshoSCVcdZzhTRGVjYVJBUzA
>
> https://drive.google.com/open?id=0BwLcshoSCVcdc0dQZGJtMWxDOFk
>
> https://drive.google.com/open?id=0BwLcshoSCVcdR3hJSHRZTjdSZm8
>
> https://drive.google.com/open?id=0BwLcshoSCVcdUzRETDlFeFIxU2M
>
> Thanks,
> Mahmoud
>
> On Tue, Mar 14, 2017 at 10:20 AM, Mahmoud Almokadem <
> prog.mahm...@gmail.com> wrote:
>
>> Thanks Erick,
>>
>> I think there are something missing, the rate I'm talking about is for
>> bulk upload and one time indexing to on-going indexing.
>> My dataset is about 250 million documents and I need to index them to
>> solr.
>>
>> Thanks Shawn for your clarification,
>>
>> I think that I got stuck on this version 6.4.1 I'll upgrade my cluster
>> and test again.
>>
>> Thanks for help
>> Mahmoud
>>
>>
>> On Tue, Mar 14, 2017 at 1:20 AM, Shawn Heisey 
>> wrote:
>>
>>> On 3/13/2017 7:58 AM, Mahmoud Almokadem wrote:
>>> > When I start my bulk indexer program the CPU utilization is 100% on
>>> each
>>> > server but the rate of the indexer is about 1500 docs per second.
>>> >
>>> > I know that some solr benchmarks reached 70,000+ doc. per second.
>>>
>>> There are *MANY* factors that affect indexing rate.  When you say that
>>> the CPU utilization is 100 percent, what operating system are you
>>> running and what tool are you using to see CPU percentage?  Within that
>>> tool, where are you looking to see that usage level?
>>>
>>> On some operating systems with some reporting tools, a server with 8 CPU
>>> cores can show up to 800 percent CPU usage, so 100 percent utilization
>>> on the Solr process may not be full utilization of the server's
>>> resources.  It also might be an indicator of the full system usage, if
>>> you are looking in the right place.
>>>
>>> > The question: What is the best way to determine the bottleneck on solr
>>> > indexing rate?
>>>
>>> I have two likely candidates for you.  The first one is a bug that
>>> affects Solr 6.4.0 and 6.4.1, which is fixed by 6.4.2.  If you don't
>>> have one of those two versions, then this is not affecting you:
>>>
>>> https://issues.apache.org/jira/browse/SOLR-10130
>>>
>>> The other likely bottleneck, which could be a problem whether or not the
>>> previous bug is present, is single-threaded indexing, so every batch of
>>> docs must wait for the previous batch to finish before it can begin, and
>>> only one CPU gets utilized on the server side.  Both Solr and SolrJ are
>>> fully capable of handling several indexing threads at once, and that is
>>> really the only way to achieve maximum indexing performance.  If you
>>> want multi-threaded (parallel) indexing, you must create the threads on
>>> the client side, or run multiple indexing processes that each handle
>>> part of the job.  Multi-threaded code is not easy to write correctly.
>>>
>>> The fieldTypes and analysis that you have configured in your schema may
>>> include classes that process very slowly, or may include so many filters
>>> that the end result is slow performance.  I am not familiar with the
>>> performance of the classes that Solr includes, so I would not be able to
>>> look at a schema and tell you which entries are slow.  As Erick
>>> mentioned, processing for 300+ fields could be one reason for slow
>>> indexing.
>>>
>>> If you are doing a commit operation for every batch, that will slow it
>>> down even more.  If you have autoSoftCommit configured with a very low
>>> maxTime or maxDocs value, that can result in extremely frequent commits
>>> that make indexing much slower.  Although frequent autoCommit is very
>>> much desirable for good operation (as long as openSearcher set to
>>> false), commits that open new searchers should be much less frequent.
>>> The best option is to only commit (with a new searcher) *once* at the
>>> end of the indexing run.  If automatic soft commits are desired, make
>>> them happen as infrequently as you can.
>>>
>>> https://lucidworks.com/understanding-transaction-logs-softco
>>> mmit-and-commit-in-sorlcloud/
>>>
>>> Using CloudSolrClient will make single-threaded indexing fairly
>>> efficient, by always sending documents to the correct shard leader.  FYI
>>> -- your 500 document batches are split into smaller batches (which I
>>> think are only 10 documents) that are directed to correct shard leaders
>>> by CloudSolrClient.  Indexing with multiple threads becomes even more
>>> important with these smaller batches.
>>>
>>> Note that with SolrJ, you will 

Error with Streaming Expressions - shortestPath

2017-03-14 Thread Zheng Lin Edwin Yeo
Hi,

I tired to run the following Streaming query with the shortestPath Stream
Source.

http://localhost:8983/solr/email/stream?expr=shortestPath(email,
  from="ed...@mail.com",
to="ad...@mail.com",
  edge="from_address=to_address",
  threads="6",
  partitionSize="300",
  maxDepth="4")=true

I get the following error in the output:

{
  "result-set":{
"docs":[{
"EXCEPTION":"java.util.concurrent.ExecutionException:
java.lang.RuntimeException: java.io.IOException:
java.util.concurrent.ExecutionException: java.io.IOException: -->
http://localhost:8983/solr/email/: An exception has occurred on the server,
refer to server log for details.",
"EOF":true,
"RESPONSE_TIME":112}]}}


Here is the logs from the error log:

java.lang.RuntimeException: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: java.io.IOException:
java.util.concurrent.ExecutionException: java.io.IOException: -->
http://localhost:8983/solr/email/: An exception has occurred on the server,
refer to server log for details.
at
org.apache.solr.client.solrj.io.graph.ShortestPathStream.open(ShortestPathStream.java:385)
at
org.apache.solr.client.solrj.io.stream.ExceptionStream.open(ExceptionStream.java:51)
at
org.apache.solr.handler.StreamHandler$TimerStream.open(StreamHandler.java:457)
at
org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:63)
at org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:193)
at
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:209)
at
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:325)
at
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:120)
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:71)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:732)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: java.io.IOException:
java.util.concurrent.ExecutionException: java.io.IOException: -->
http://localhost:8983/solr/email/: An exception has occurred on the server,
refer to server log for details.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.solr.client.solrj.io.graph.ShortestPathStream.open(ShortestPathStream.java:357)
... 39 more
Caused by: java.lang.RuntimeException: java.io.IOException:

Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
Here is the profiler screenshot from VisualVM after upgrading

https://drive.google.com/open?id=0BwLcshoSCVcddldVRTExaDR2dzg

the jetty is taking the most time on CPU. Does this mean, the jetty is the
bottleneck on indexing?

Thanks,
Mahmoud


On Tue, Mar 14, 2017 at 11:41 AM, Mahmoud Almokadem 
wrote:

> Thanks Shalin,
>
> I'm posting data to solr with SolrInputDocument using SolrJ.
>
> According to the profiler, the com.codahale.metrics.Meter.mark is take
> much processing than others as mentioned on this issue
> https://issues.apache.org/jira/browse/SOLR-10130.
>
> And I think the profiler of sematext is different than VisualVM.
>
> Thanks for help,
> Mahmoud
>
>
>
> On Tue, Mar 14, 2017 at 11:08 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> According to the profiler output, a significant amount of cpu is being
>> spent in JSON parsing but your previous email said that you use SolrJ.
>> SolrJ uses the javabin binary format to send documents to Solr and it
>> never ever uses JSON so there is definitely some other indexing
>> process that you have not accounted for.
>>
>> On Tue, Mar 14, 2017 at 12:31 AM, Mahmoud Almokadem
>>  wrote:
>> > Thanks Erick,
>> >
>> > I've commented out the line SolrClient.add(doclist) and get 5500+ docs
>> per
>> > second from single producer.
>> >
>> > Regarding more shards, you mean use 2 nodes with 8 shards per node so we
>> > got 16 shards on the same 2 nodes or spread shards over more nodes?
>> >
>> > I'm using solr 6.4.1 with zookeeper on the same nodes.
>> >
>> > Here's what I got from sematext profiler
>> >
>> > 51%
>> > Thread.java:745java.lang.Thread#run
>> >
>> > 42%
>> > QueuedThreadPool.java:589
>> > org.eclipse.jetty.util.thread.QueuedThreadPool$2#run
>> > Collapsed 29 calls (Expand)
>> >
>> > 43%
>> > UpdateRequestHandler.java:97
>> > org.apache.solr.handler.UpdateRequestHandler$1#load
>> >
>> > 30%
>> > JsonLoader.java:78org.apache.solr.handler.loader.JsonLoader#load
>> >
>> > 30%
>> > JsonLoader.java:115
>> > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader#load
>> >
>> > 13%
>> > JavabinLoader.java:54org.apache.solr.handler.loader.JavabinLoader#load
>> >
>> > 9%
>> > ThreadPoolExecutor.java:617
>> > java.util.concurrent.ThreadPoolExecutor$Worker#run
>> >
>> > 9%
>> > ThreadPoolExecutor.java:1142
>> > java.util.concurrent.ThreadPoolExecutor#runWorker
>> >
>> > 33%
>> > ConcurrentMergeScheduler.java:626
>> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread#run
>> >
>> > 33%
>> > ConcurrentMergeScheduler.java:588
>> > org.apache.lucene.index.ConcurrentMergeScheduler#doMerge
>> >
>> > 33%
>> > SolrIndexWriter.java:233org.apache.solr.update.SolrIndexWriter#merge
>> >
>> > 33%
>> > IndexWriter.java:3920org.apache.lucene.index.IndexWriter#merge
>> >
>> > 33%
>> > IndexWriter.java:4343org.apache.lucene.index.IndexWriter#mergeMiddle
>> >
>> > 20%
>> > SegmentMerger.java:101org.apache.lucene.index.SegmentMerger#merge
>> >
>> > 11%
>> > SegmentMerger.java:89org.apache.lucene.index.SegmentMerger#merge
>> >
>> > 2%
>> > SegmentMerger.java:144org.apache.lucene.index.SegmentMerger#merge
>> >
>> >
>> > On Mon, Mar 13, 2017 at 5:12 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> >> Note that 70,000 docs/second pretty much guarantees that there are
>> >> multiple shards. Lots of shards.
>> >>
>> >> But since you're using SolrJ, the  very first thing I'd try would be
>> >> to comment out the SolrClient.add(doclist) call so you're doing
>> >> everything _except_ send the docs to Solr. That'll tell you whether
>> >> there's any bottleneck on getting the docs from the system of record.
>> >> The fact that you're pegging the CPUs argues that you are feeding Solr
>> >> as fast as Solr can go so this is just a sanity check. But it's
>> >> simple/fast.
>> >>
>> >> As far as what on Solr could be the bottleneck, no real way to know
>> >> without profiling. But 300+ fields per doc probably just means you're
>> >> doing a lot of processing, I'm not particularly hopeful you'll be able
>> >> to speed things up without either more shards or simplifying your
>> >> schema.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, Mar 13, 2017 at 6:58 AM, Mahmoud Almokadem
>> >>  wrote:
>> >> > Hi great community,
>> >> >
>> >> > I have a SolrCloud with the following configuration:
>> >> >
>> >> >- 2 nodes (r3.2xlarge 61GB RAM)
>> >> >- 4 shards.
>> >> >- The producer can produce 13,000+ docs per second
>> >> >- The schema contains about 300+ fields and the document size is
>> about
>> >> >3KB.
>> >> >- Using SolrJ and SolrCloudClient, each batch to solr contains 500
>> >> docs.
>> >> >
>> >> > When I start my bulk indexer program the CPU utilization is 100% on
>> each
>> >> > server but the rate of the indexer is about 1500 docs per second.
>> >> >
>> >> > I know that some solr benchmarks reached 70,000+ doc. per second.
>> >> >
>> >> 

Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
Thanks Shalin,

I'm posting data to solr with SolrInputDocument using SolrJ.

According to the profiler, the com.codahale.metrics.Meter.mark is take much
processing than others as mentioned on this issue
https://issues.apache.org/jira/browse/SOLR-10130.

And I think the profiler of sematext is different than VisualVM.

Thanks for help,
Mahmoud



On Tue, Mar 14, 2017 at 11:08 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> According to the profiler output, a significant amount of cpu is being
> spent in JSON parsing but your previous email said that you use SolrJ.
> SolrJ uses the javabin binary format to send documents to Solr and it
> never ever uses JSON so there is definitely some other indexing
> process that you have not accounted for.
>
> On Tue, Mar 14, 2017 at 12:31 AM, Mahmoud Almokadem
>  wrote:
> > Thanks Erick,
> >
> > I've commented out the line SolrClient.add(doclist) and get 5500+ docs
> per
> > second from single producer.
> >
> > Regarding more shards, you mean use 2 nodes with 8 shards per node so we
> > got 16 shards on the same 2 nodes or spread shards over more nodes?
> >
> > I'm using solr 6.4.1 with zookeeper on the same nodes.
> >
> > Here's what I got from sematext profiler
> >
> > 51%
> > Thread.java:745java.lang.Thread#run
> >
> > 42%
> > QueuedThreadPool.java:589
> > org.eclipse.jetty.util.thread.QueuedThreadPool$2#run
> > Collapsed 29 calls (Expand)
> >
> > 43%
> > UpdateRequestHandler.java:97
> > org.apache.solr.handler.UpdateRequestHandler$1#load
> >
> > 30%
> > JsonLoader.java:78org.apache.solr.handler.loader.JsonLoader#load
> >
> > 30%
> > JsonLoader.java:115
> > org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader#load
> >
> > 13%
> > JavabinLoader.java:54org.apache.solr.handler.loader.JavabinLoader#load
> >
> > 9%
> > ThreadPoolExecutor.java:617
> > java.util.concurrent.ThreadPoolExecutor$Worker#run
> >
> > 9%
> > ThreadPoolExecutor.java:1142
> > java.util.concurrent.ThreadPoolExecutor#runWorker
> >
> > 33%
> > ConcurrentMergeScheduler.java:626
> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread#run
> >
> > 33%
> > ConcurrentMergeScheduler.java:588
> > org.apache.lucene.index.ConcurrentMergeScheduler#doMerge
> >
> > 33%
> > SolrIndexWriter.java:233org.apache.solr.update.SolrIndexWriter#merge
> >
> > 33%
> > IndexWriter.java:3920org.apache.lucene.index.IndexWriter#merge
> >
> > 33%
> > IndexWriter.java:4343org.apache.lucene.index.IndexWriter#mergeMiddle
> >
> > 20%
> > SegmentMerger.java:101org.apache.lucene.index.SegmentMerger#merge
> >
> > 11%
> > SegmentMerger.java:89org.apache.lucene.index.SegmentMerger#merge
> >
> > 2%
> > SegmentMerger.java:144org.apache.lucene.index.SegmentMerger#merge
> >
> >
> > On Mon, Mar 13, 2017 at 5:12 PM, Erick Erickson  >
> > wrote:
> >
> >> Note that 70,000 docs/second pretty much guarantees that there are
> >> multiple shards. Lots of shards.
> >>
> >> But since you're using SolrJ, the  very first thing I'd try would be
> >> to comment out the SolrClient.add(doclist) call so you're doing
> >> everything _except_ send the docs to Solr. That'll tell you whether
> >> there's any bottleneck on getting the docs from the system of record.
> >> The fact that you're pegging the CPUs argues that you are feeding Solr
> >> as fast as Solr can go so this is just a sanity check. But it's
> >> simple/fast.
> >>
> >> As far as what on Solr could be the bottleneck, no real way to know
> >> without profiling. But 300+ fields per doc probably just means you're
> >> doing a lot of processing, I'm not particularly hopeful you'll be able
> >> to speed things up without either more shards or simplifying your
> >> schema.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Mar 13, 2017 at 6:58 AM, Mahmoud Almokadem
> >>  wrote:
> >> > Hi great community,
> >> >
> >> > I have a SolrCloud with the following configuration:
> >> >
> >> >- 2 nodes (r3.2xlarge 61GB RAM)
> >> >- 4 shards.
> >> >- The producer can produce 13,000+ docs per second
> >> >- The schema contains about 300+ fields and the document size is
> about
> >> >3KB.
> >> >- Using SolrJ and SolrCloudClient, each batch to solr contains 500
> >> docs.
> >> >
> >> > When I start my bulk indexer program the CPU utilization is 100% on
> each
> >> > server but the rate of the indexer is about 1500 docs per second.
> >> >
> >> > I know that some solr benchmarks reached 70,000+ doc. per second.
> >> >
> >> > The question: What is the best way to determine the bottleneck on solr
> >> > indexing rate?
> >> >
> >> > Thanks,
> >> > Mahmoud
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Indexing CPU performance

2017-03-14 Thread Shalin Shekhar Mangar
According to the profiler output, a significant amount of cpu is being
spent in JSON parsing but your previous email said that you use SolrJ.
SolrJ uses the javabin binary format to send documents to Solr and it
never ever uses JSON so there is definitely some other indexing
process that you have not accounted for.

On Tue, Mar 14, 2017 at 12:31 AM, Mahmoud Almokadem
 wrote:
> Thanks Erick,
>
> I've commented out the line SolrClient.add(doclist) and get 5500+ docs per
> second from single producer.
>
> Regarding more shards, you mean use 2 nodes with 8 shards per node so we
> got 16 shards on the same 2 nodes or spread shards over more nodes?
>
> I'm using solr 6.4.1 with zookeeper on the same nodes.
>
> Here's what I got from sematext profiler
>
> 51%
> Thread.java:745java.lang.Thread#run
>
> 42%
> QueuedThreadPool.java:589
> org.eclipse.jetty.util.thread.QueuedThreadPool$2#run
> Collapsed 29 calls (Expand)
>
> 43%
> UpdateRequestHandler.java:97
> org.apache.solr.handler.UpdateRequestHandler$1#load
>
> 30%
> JsonLoader.java:78org.apache.solr.handler.loader.JsonLoader#load
>
> 30%
> JsonLoader.java:115
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader#load
>
> 13%
> JavabinLoader.java:54org.apache.solr.handler.loader.JavabinLoader#load
>
> 9%
> ThreadPoolExecutor.java:617
> java.util.concurrent.ThreadPoolExecutor$Worker#run
>
> 9%
> ThreadPoolExecutor.java:1142
> java.util.concurrent.ThreadPoolExecutor#runWorker
>
> 33%
> ConcurrentMergeScheduler.java:626
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread#run
>
> 33%
> ConcurrentMergeScheduler.java:588
> org.apache.lucene.index.ConcurrentMergeScheduler#doMerge
>
> 33%
> SolrIndexWriter.java:233org.apache.solr.update.SolrIndexWriter#merge
>
> 33%
> IndexWriter.java:3920org.apache.lucene.index.IndexWriter#merge
>
> 33%
> IndexWriter.java:4343org.apache.lucene.index.IndexWriter#mergeMiddle
>
> 20%
> SegmentMerger.java:101org.apache.lucene.index.SegmentMerger#merge
>
> 11%
> SegmentMerger.java:89org.apache.lucene.index.SegmentMerger#merge
>
> 2%
> SegmentMerger.java:144org.apache.lucene.index.SegmentMerger#merge
>
>
> On Mon, Mar 13, 2017 at 5:12 PM, Erick Erickson 
> wrote:
>
>> Note that 70,000 docs/second pretty much guarantees that there are
>> multiple shards. Lots of shards.
>>
>> But since you're using SolrJ, the  very first thing I'd try would be
>> to comment out the SolrClient.add(doclist) call so you're doing
>> everything _except_ send the docs to Solr. That'll tell you whether
>> there's any bottleneck on getting the docs from the system of record.
>> The fact that you're pegging the CPUs argues that you are feeding Solr
>> as fast as Solr can go so this is just a sanity check. But it's
>> simple/fast.
>>
>> As far as what on Solr could be the bottleneck, no real way to know
>> without profiling. But 300+ fields per doc probably just means you're
>> doing a lot of processing, I'm not particularly hopeful you'll be able
>> to speed things up without either more shards or simplifying your
>> schema.
>>
>> Best,
>> Erick
>>
>> On Mon, Mar 13, 2017 at 6:58 AM, Mahmoud Almokadem
>>  wrote:
>> > Hi great community,
>> >
>> > I have a SolrCloud with the following configuration:
>> >
>> >- 2 nodes (r3.2xlarge 61GB RAM)
>> >- 4 shards.
>> >- The producer can produce 13,000+ docs per second
>> >- The schema contains about 300+ fields and the document size is about
>> >3KB.
>> >- Using SolrJ and SolrCloudClient, each batch to solr contains 500
>> docs.
>> >
>> > When I start my bulk indexer program the CPU utilization is 100% on each
>> > server but the rate of the indexer is about 1500 docs per second.
>> >
>> > I know that some solr benchmarks reached 70,000+ doc. per second.
>> >
>> > The question: What is the best way to determine the bottleneck on solr
>> > indexing rate?
>> >
>> > Thanks,
>> > Mahmoud
>>



-- 
Regards,
Shalin Shekhar Mangar.


Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-14 Thread Gerald Reinhart


Hi,

   The custom code we have is something like this :

public class MySearchHandlerextends SearchHandler {


@Override public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse 
rsp)throws Exception {

SolrIndexSearcher  searcher =req.getSearcher();

try{

 // Do stuff with the searcher


}finally {
req.close();
}



}

}

 Despite the fact that we always close the request each time we get
a SolrIndexSearcher from the request, the number of SolrIndexSearcher
instances is increasing. Each time a new commit is done on the index, a
new Searcher is created (this is normal) but the old one remains. Is
there something wrong with this custom code ? Shall we try something
explained there :
http://stackoverflow.com/questions/20515493/solr-huge-number-of-open-searchers
? Thanks, Gérald Reinhart (working with Elodie on the subject) On
03/07/2017 05:45 PM, Elodie Sannier wrote:

Thank you Erick for your answer.

The files are deleted even without JVM restart but they are still seen
as DELETED by the kernel.

We have a custom code and for the migration to Solr 6.4.0 we have added
a new code with req.getSearcher() but without "close".
We will decrement the reference count on a resource for the Searcher
(prevent the Searcher remains open after a commit) and see if it fixes
the problem.

Elodie

On 03/07/2017 03:55 PM, Erick Erickson wrote:

Just as a sanity check, if you restart the Solr JVM, do the files
disappear from disk?

Do you have any custom code anywhere in this chain? If so, do you open
any searchers but
fail to close them? Although why 6.4 would manifest the problem but
other code wouldn't
is a mystery, just another sanity check.

Best,
Erick

On Tue, Mar 7, 2017 at 6:44 AM, Elodie Sannier  wrote:

Hello,

We have migrated from Solr 5.4.1 to Solr 6.4.0 and the disk usage has
increased.
We found hundreds of references to deleted index files being held by solr.
Before the migration, we had 15-30% of disk space used, after the migration
we have 60-90% of disk space used.

We are using Solr Cloud with 2 collections.

The commands applied on the collections are:
- for incremental indexation mode: add, deleteById with commitWithin of 30
minutes
- for full indexation mode: add, deleteById, commit
- for switch between incremental and full mode: deleteByQuery, createAlias,
reload
- there is also an autocommit every 15 minutes

We have seen the email "Solr leaking references to deleted files"
2016-05-31 which describe the same problem but the mentioned bugs are fixed.

We manually tried to force a commit, a reload and an optimize on the
collections without effect.

Is a problem of configuration (merge / delete policy) or a possible
regression in the Solr code ?

Thank you


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce
message, merci de le détruire et d'en avertir l'expéditeur.


--

Elodie Sannier
Software engineer



*E*elodie.sann...@kelkoo.fr*Skype*kelkooelodies
*T*+33 (0)4 56 09 07 55
*A*Parc Sud Galaxie, 6, rue des Méridiens, 38130 Echirolles


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

--

Gérald ReinhartSoftware Engineer



*E*gerald.reinh...@kelkoo.com*A*Parc Sud Galaxie, 6, rue des Méridiens,
38130 Echirolles, FR


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
Thanks Erick,

I think there are something missing, the rate I'm talking about is for bulk
upload and one time indexing to on-going indexing.
My dataset is about 250 million documents and I need to index them to solr.

Thanks Shawn for your clarification,

I think that I got stuck on this version 6.4.1 I'll upgrade my cluster and
test again.

Thanks for help
Mahmoud


On Tue, Mar 14, 2017 at 1:20 AM, Shawn Heisey  wrote:

> On 3/13/2017 7:58 AM, Mahmoud Almokadem wrote:
> > When I start my bulk indexer program the CPU utilization is 100% on each
> > server but the rate of the indexer is about 1500 docs per second.
> >
> > I know that some solr benchmarks reached 70,000+ doc. per second.
>
> There are *MANY* factors that affect indexing rate.  When you say that
> the CPU utilization is 100 percent, what operating system are you
> running and what tool are you using to see CPU percentage?  Within that
> tool, where are you looking to see that usage level?
>
> On some operating systems with some reporting tools, a server with 8 CPU
> cores can show up to 800 percent CPU usage, so 100 percent utilization
> on the Solr process may not be full utilization of the server's
> resources.  It also might be an indicator of the full system usage, if
> you are looking in the right place.
>
> > The question: What is the best way to determine the bottleneck on solr
> > indexing rate?
>
> I have two likely candidates for you.  The first one is a bug that
> affects Solr 6.4.0 and 6.4.1, which is fixed by 6.4.2.  If you don't
> have one of those two versions, then this is not affecting you:
>
> https://issues.apache.org/jira/browse/SOLR-10130
>
> The other likely bottleneck, which could be a problem whether or not the
> previous bug is present, is single-threaded indexing, so every batch of
> docs must wait for the previous batch to finish before it can begin, and
> only one CPU gets utilized on the server side.  Both Solr and SolrJ are
> fully capable of handling several indexing threads at once, and that is
> really the only way to achieve maximum indexing performance.  If you
> want multi-threaded (parallel) indexing, you must create the threads on
> the client side, or run multiple indexing processes that each handle
> part of the job.  Multi-threaded code is not easy to write correctly.
>
> The fieldTypes and analysis that you have configured in your schema may
> include classes that process very slowly, or may include so many filters
> that the end result is slow performance.  I am not familiar with the
> performance of the classes that Solr includes, so I would not be able to
> look at a schema and tell you which entries are slow.  As Erick
> mentioned, processing for 300+ fields could be one reason for slow
> indexing.
>
> If you are doing a commit operation for every batch, that will slow it
> down even more.  If you have autoSoftCommit configured with a very low
> maxTime or maxDocs value, that can result in extremely frequent commits
> that make indexing much slower.  Although frequent autoCommit is very
> much desirable for good operation (as long as openSearcher set to
> false), commits that open new searchers should be much less frequent.
> The best option is to only commit (with a new searcher) *once* at the
> end of the indexing run.  If automatic soft commits are desired, make
> them happen as infrequently as you can.
>
> https://lucidworks.com/understanding-transaction-
> logs-softcommit-and-commit-in-sorlcloud/
>
> Using CloudSolrClient will make single-threaded indexing fairly
> efficient, by always sending documents to the correct shard leader.  FYI
> -- your 500 document batches are split into smaller batches (which I
> think are only 10 documents) that are directed to correct shard leaders
> by CloudSolrClient.  Indexing with multiple threads becomes even more
> important with these smaller batches.
>
> Note that with SolrJ, you will need to tweak the HttpClient creation, or
> you will likely find that each SolrJ client object can only utilize two
> threads to each Solr server.  The default per-route maximum connection
> limit for HttpClient is 2, with a total connection limit of 20.
>
> This code snippet shows how I create a Solr client that can do many
> threads (300 per route, 5000 total) and also has custom timeout settings:
>
> RequestConfig rc = RequestConfig.custom().setConnectTimeout(15000)
> .setSocketTimeout(Const.SOCKET_TIMEOUT).build();
> httpClient = HttpClients.custom().setDefaultRequestConfig(rc)
> .setMaxConnPerRoute(300).setMaxConnTotal(5000)
> .disableAutomaticRetries().build();
> client = new HttpSolrClient(serverBaseUrl, httpClient);
>
> This is using HttpSolrClient, but CloudSolrClient can be built in a
> similar manner.  I am not yet using the new SolrJ Builder paradigm found
> in 6.x, I should switch my code to that.
>
> Thanks,
> Shawn
>
>


Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
I'm using VisualVM and sematext to monitor my cluster.

Below is screenshots for each of them.

https://drive.google.com/open?id=0BwLcshoSCVcdWHRJeUNyekxWN28

https://drive.google.com/open?id=0BwLcshoSCVcdZzhTRGVjYVJBUzA

https://drive.google.com/open?id=0BwLcshoSCVcdc0dQZGJtMWxDOFk

https://drive.google.com/open?id=0BwLcshoSCVcdR3hJSHRZTjdSZm8

https://drive.google.com/open?id=0BwLcshoSCVcdUzRETDlFeFIxU2M

Thanks,
Mahmoud

On Tue, Mar 14, 2017 at 10:20 AM, Mahmoud Almokadem 
wrote:

> Thanks Erick,
>
> I think there are something missing, the rate I'm talking about is for
> bulk upload and one time indexing to on-going indexing.
> My dataset is about 250 million documents and I need to index them to solr.
>
> Thanks Shawn for your clarification,
>
> I think that I got stuck on this version 6.4.1 I'll upgrade my cluster and
> test again.
>
> Thanks for help
> Mahmoud
>
>
> On Tue, Mar 14, 2017 at 1:20 AM, Shawn Heisey  wrote:
>
>> On 3/13/2017 7:58 AM, Mahmoud Almokadem wrote:
>> > When I start my bulk indexer program the CPU utilization is 100% on each
>> > server but the rate of the indexer is about 1500 docs per second.
>> >
>> > I know that some solr benchmarks reached 70,000+ doc. per second.
>>
>> There are *MANY* factors that affect indexing rate.  When you say that
>> the CPU utilization is 100 percent, what operating system are you
>> running and what tool are you using to see CPU percentage?  Within that
>> tool, where are you looking to see that usage level?
>>
>> On some operating systems with some reporting tools, a server with 8 CPU
>> cores can show up to 800 percent CPU usage, so 100 percent utilization
>> on the Solr process may not be full utilization of the server's
>> resources.  It also might be an indicator of the full system usage, if
>> you are looking in the right place.
>>
>> > The question: What is the best way to determine the bottleneck on solr
>> > indexing rate?
>>
>> I have two likely candidates for you.  The first one is a bug that
>> affects Solr 6.4.0 and 6.4.1, which is fixed by 6.4.2.  If you don't
>> have one of those two versions, then this is not affecting you:
>>
>> https://issues.apache.org/jira/browse/SOLR-10130
>>
>> The other likely bottleneck, which could be a problem whether or not the
>> previous bug is present, is single-threaded indexing, so every batch of
>> docs must wait for the previous batch to finish before it can begin, and
>> only one CPU gets utilized on the server side.  Both Solr and SolrJ are
>> fully capable of handling several indexing threads at once, and that is
>> really the only way to achieve maximum indexing performance.  If you
>> want multi-threaded (parallel) indexing, you must create the threads on
>> the client side, or run multiple indexing processes that each handle
>> part of the job.  Multi-threaded code is not easy to write correctly.
>>
>> The fieldTypes and analysis that you have configured in your schema may
>> include classes that process very slowly, or may include so many filters
>> that the end result is slow performance.  I am not familiar with the
>> performance of the classes that Solr includes, so I would not be able to
>> look at a schema and tell you which entries are slow.  As Erick
>> mentioned, processing for 300+ fields could be one reason for slow
>> indexing.
>>
>> If you are doing a commit operation for every batch, that will slow it
>> down even more.  If you have autoSoftCommit configured with a very low
>> maxTime or maxDocs value, that can result in extremely frequent commits
>> that make indexing much slower.  Although frequent autoCommit is very
>> much desirable for good operation (as long as openSearcher set to
>> false), commits that open new searchers should be much less frequent.
>> The best option is to only commit (with a new searcher) *once* at the
>> end of the indexing run.  If automatic soft commits are desired, make
>> them happen as infrequently as you can.
>>
>> https://lucidworks.com/understanding-transaction-logs-
>> softcommit-and-commit-in-sorlcloud/
>>
>> Using CloudSolrClient will make single-threaded indexing fairly
>> efficient, by always sending documents to the correct shard leader.  FYI
>> -- your 500 document batches are split into smaller batches (which I
>> think are only 10 documents) that are directed to correct shard leaders
>> by CloudSolrClient.  Indexing with multiple threads becomes even more
>> important with these smaller batches.
>>
>> Note that with SolrJ, you will need to tweak the HttpClient creation, or
>> you will likely find that each SolrJ client object can only utilize two
>> threads to each Solr server.  The default per-route maximum connection
>> limit for HttpClient is 2, with a total connection limit of 20.
>>
>> This code snippet shows how I create a Solr client that can do many
>> threads (300 per route, 5000 total) and also has custom timeout settings:
>>
>> RequestConfig rc = 

Re: Iterating sorted result docs in a custom search component

2017-03-14 Thread alexpusch
Single field. I'm iterating over the results once, and need each doc in
memory only for that single iteration. I need different fields from each doc
according to the algorithm state.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Iterating-sorted-result-docs-in-a-custom-search-component-tp4324497p4324818.html
Sent from the Solr - User mailing list archive at Nabble.com.