6.4.0 collection leader election and recovery issues

2017-02-01 Thread Ravi Solr
Hello,
 Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
hours of debugging spree!! Can somebody kindly help me  out of this misery.

I have a set has 8 single shard collections with 3 replicas. As soon as I
updated the configs and started the servers one of my collection got stuck
with no leader. I have restarted solr to no avail, I also tried to force a
leader via collections API that dint work either. I also see that, from
time to time multiple solr nodes go down all at the same time, only a
restart resolves the issue.

The error snippets are shown below

2017-02-02 01:43:42.785 ERROR
(recoveryExecutor-3-thread-6-processing-n:10.128.159.245:9001_solr
x:clicktrack_shard1_replica1 s:shard1 c:clicktrack r:core_node1)
[c:clicktrack s:shard1 r:core_node1 x:clicktrack_shard1_replica1]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException: No
registered leader was found after waiting for 4000ms , collection:
clicktrack slice: shard1

solr.log.9:2017-02-02 01:43:41.336 INFO
(zkCallback-4-thread-29-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])
solr.log.9:2017-02-02 01:43:42.224 INFO
(zkCallback-4-thread-29-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])
solr.log.9:2017-02-02 01:43:43.767 INFO
(zkCallback-4-thread-23-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])


Suspecting the worst I backed up the index and renamed the collection's
data folder and restarted the servers, this time the collection got a
proper leader. So is my index really corrupted ? Solr UI showed live nodes
just like the logs but without any leader. Even with the leader issue
somewhat alleviated after renaming the data folder and letting silr create
a new data folder my servers did go down a couple of times.

I am not all that well versed with zookeeper...any trick to make zookeeper
pick a leader and be happy ? Did anybody have solr/zookeeper issues with
6.4.0 ?

Thanks

Ravi Kiran Bhaskar


How long for autoAddReplica?

2017-02-01 Thread Walter Underwood
I added a new node an shut down a node with a shard replica on it. It has been 
an hour and I don’t see any activity toward making a new replica.

The new node and the one I shut down are both 6.4. The rest of the 16-node 
cluster is 6.2.1.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




Re: How to combine third party search data as top results ?

2017-02-01 Thread Joel Bernstein
Also this presentation discusses the RankQuery (Starting on slide 16)
http://www.slideshare.net/lucidworks/managed-search-presented-by-jacob-graves-getty-images

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 1, 2017 at 9:58 PM, Joel Bernstein  wrote:

> This type of ranking behavior is what the RankQuery is designed to do. A
> RankQuery allows you to inject your own TopDocs collector into the query
> and take full control of the ranking. It's more complex to implement
> though. Here is an example RankQuery implementation:
>
> https://github.com/apache/lucene-solr/blob/master/solr/
> core/src/java/org/apache/solr/search/ReRankQParserPlugin.java
>
> And the base class this extends:
>
> https://github.com/apache/lucene-solr/blob/master/solr/
> core/src/java/org/apache/solr/search/AbstractReRankQuery.java
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Feb 1, 2017 at 4:53 PM, Doug Turnbull  opensourceconnections.com> wrote:
>
>> I was going to say what Charlie said! I would trust Flax's work in this
>> area :)
>>
>> -Doug
>>
>> On Wed, Feb 1, 2017 at 3:10 PM shamik  wrote:
>>
>> > Charlie, thanks for sharing the information. I'm going to take a look
>> and
>> > get
>> > back to you.
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://lucene.472066.n3.nabble.com/How-to-combine-third-
>> party-search-data-as-top-results-tp4318116p4318349.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>>
>
>


Re: How to combine third party search data as top results ?

2017-02-01 Thread Joel Bernstein
This type of ranking behavior is what the RankQuery is designed to do. A
RankQuery allows you to inject your own TopDocs collector into the query
and take full control of the ranking. It's more complex to implement
though. Here is an example RankQuery implementation:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ReRankQParserPlugin.java

And the base class this extends:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/AbstractReRankQuery.java

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 1, 2017 at 4:53 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> I was going to say what Charlie said! I would trust Flax's work in this
> area :)
>
> -Doug
>
> On Wed, Feb 1, 2017 at 3:10 PM shamik  wrote:
>
> > Charlie, thanks for sharing the information. I'm going to take a look and
> > get
> > back to you.
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/How-to-combine-
> third-party-search-data-as-top-results-tp4318116p4318349.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Collection will not replicate

2017-02-01 Thread Erick Erickson
It's worth a try to take down your entire cluster. Bring one machine
back up at a time. There _may_ be something like a 3 minute wait
before each of the replicas on that machine come up, the leader
election process has a 180 second delay before the replicas on that
node take over leadership to wait for the last known good leader to
come up.

Continue bringing one node up at a time and wait patiently  until all
the replicas on it are green and until you have a leader for each
shard elected. Bringing up the rest of the Solr nodes should be
quicker then.

Be sure to sequence things so you have known good Solr nodes come up
first for the shard that's wonky. By that I mean that the first node
you bring up for the leaderless shard should be the one with the best
chance of having a totally OK index.


Let's claim that the above does bring up a leader for each shard. If
you still have a replica that refuses to come up, use the
DELETEREPLICA command to remove it. Just for insurance, I'd take the
Solr node down after the DELETEREPLICA and remove the entire core
directory for the replica that didn't come up. Then restart the node
and use the ADDREPLICA collections API command to put it back.

If none of that works, you could try hand-editing the state.json file
and _make_ one of the shards a leader (I'd do this with the Solr nodes
down), but that's not for the faint of heart.

Best,
Erick

On Wed, Feb 1, 2017 at 1:57 PM, Jeff Wartes  wrote:
> Sounds similar to a thread last year:
> http://lucene.472066.n3.nabble.com/Node-not-recovering-leader-elections-not-occuring-tp4287819p4287866.html
>
>
>
> On 2/1/17, 7:49 AM, "tedsolr"  wrote:
>
> I have version 5.2.1. Short of an upgrade, are there any remedies?
>
>
> Erick Erickson wrote
> > What version of Solr? since 5.4 there's been a FORCELEADER colelctions
> > API call that might help.
> >
> > I'd run it with the newly added replicas offline. you only want it to
> > have good replicas to choose from.
> >
> > Best,
> > Erick
> >
> > On Wed, Feb 1, 2017 at 6:48 AM, tedsolr 
>
> > tsmith@
>
> >  wrote:
> >> Update! I did find an error:
> >>
> >> 2017-02-01 09:23:22.673 ERROR org.apache.solr.common.SolrException
> >> :org.apache.solr.common.SolrException: Error getting leader from zk for
> >> shard shard1
> >> 
> >> Caused by: org.apache.solr.common.SolrException: Could not get leader
> >> props
> >> at
> >> 
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1040)
> >> at
> >> 
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1004)
> >> at
> >> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:960)
> >> ... 14 more
> >> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> >> KeeperErrorCode = NoNode for /collections/colname/leaders/shard1
> >> at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> >>
> >> When I view the cluster status I see that this shard does not have a
> >> leader.
> >> So it appears I need to force the leader designation to the "active"
> >> replica. How do I do that?
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> 
> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318265.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318283.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: project related configsets need to be deployed in both data and solr install folders ?

2017-02-01 Thread Renee Sun
thanks for your time!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: project related configsets need to be deployed in both data and solr install folders ?

2017-02-01 Thread Chris Hostetter

Renee: Huh ... so sounds like something must have been wonky in your 
original install?

Glad it worked out for you, and thanks for following up.


: Date: Wed, 1 Feb 2017 15:09:54 -0700 (MST)
: From: Renee Sun 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: project related configsets need to be deployed in both data and
: solr install folders ?
: 
: Hi Chris,
: since I have been playing with this install, and I am not certain if I have
: unknowingly messed some other settings. I want to avoid put in a false Jira
: wasting your time. 
: 
: I wiped out everything on my solr box and did  a fresh install of solr
: 6.4.0, made sure my config file set are placed in the data folder (
: /myprojectdata/solr/data/configsets/myproject_configs ).  My solr home is
: set to  /myprojectdata/solr/data , it is WORKING now.
: 
: I did not have to specify configSetBaseDir in the solr.xml (its in the data
: folder  /myprojectdata/solr/data/solr.xml, NOT the one in install folder
: /opt/solr/server/solr/solr.xml), the default correctly point at the solr
: home which is my data folder, and find the config file set.
: 
: So there is no problem, everything works fine, I can create new core without
: any issue. There is no bug whatsoever.
: 
: Thank you for all your help!
: 
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318369.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss
http://www.lucidworks.com/


Re: project related configsets need to be deployed in both data and solr install folders ?

2017-02-01 Thread Renee Sun
Hi Chris,
since I have been playing with this install, and I am not certain if I have
unknowingly messed some other settings. I want to avoid put in a false Jira
wasting your time. 

I wiped out everything on my solr box and did  a fresh install of solr
6.4.0, made sure my config file set are placed in the data folder (
/myprojectdata/solr/data/configsets/myproject_configs ).  My solr home is
set to  /myprojectdata/solr/data , it is WORKING now.

I did not have to specify configSetBaseDir in the solr.xml (its in the data
folder  /myprojectdata/solr/data/solr.xml, NOT the one in install folder
/opt/solr/server/solr/solr.xml), the default correctly point at the solr
home which is my data folder, and find the config file set.

So there is no problem, everything works fine, I can create new core without
any issue. There is no bug whatsoever.

Thank you for all your help!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318369.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection will not replicate

2017-02-01 Thread Jeff Wartes
Sounds similar to a thread last year:
http://lucene.472066.n3.nabble.com/Node-not-recovering-leader-elections-not-occuring-tp4287819p4287866.html



On 2/1/17, 7:49 AM, "tedsolr"  wrote:

I have version 5.2.1. Short of an upgrade, are there any remedies?


Erick Erickson wrote
> What version of Solr? since 5.4 there's been a FORCELEADER colelctions
> API call that might help.
> 
> I'd run it with the newly added replicas offline. you only want it to
> have good replicas to choose from.
> 
> Best,
> Erick
> 
> On Wed, Feb 1, 2017 at 6:48 AM, tedsolr 

> tsmith@

>  wrote:
>> Update! I did find an error:
>>
>> 2017-02-01 09:23:22.673 ERROR org.apache.solr.common.SolrException
>> :org.apache.solr.common.SolrException: Error getting leader from zk for
>> shard shard1
>> 
>> Caused by: org.apache.solr.common.SolrException: Could not get leader
>> props
>> at
>> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1040)
>> at
>> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1004)
>> at
>> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:960)
>> ... 14 more
>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = NoNode for /collections/colname/leaders/shard1
>> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>>
>> When I view the cluster status I see that this shard does not have a
>> leader.
>> So it appears I need to force the leader designation to the "active"
>> replica. How do I do that?
>>
>>
>>
>> --
>> View this message in context:
>> 
http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318265.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318283.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: How to combine third party search data as top results ?

2017-02-01 Thread Doug Turnbull
I was going to say what Charlie said! I would trust Flax's work in this
area :)

-Doug

On Wed, Feb 1, 2017 at 3:10 PM shamik  wrote:

> Charlie, thanks for sharing the information. I'm going to take a look and
> get
> back to you.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-combine-third-party-search-data-as-top-results-tp4318116p4318349.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

2017-02-01 Thread Erick Erickson
The termvectors and offsets aren't necessary, they can be beneficial
for speed reasons so I'd defer them.

I ran a quick test on 6.0 with your definitions and it works just
fine. I did have to comment out your custom stopwords filter on the
indexing but unless you're substituting for pairs like you indicate
(and I don't see how you could be) that shouldn't matter. I also am
using the default stopwords.txt file and assuming you don't have
anything unusual there.

I specified hl.fl=field and a q=field:1a which worked, as did
hl.fl=field and h.q=field:1a

What that means is probably that you've changed "something" in your
setup that's causing this. I've often found it's easiest to just start
over and add one thing at a time in a test bed (my laptop if you must
know) until I had my "aha" moment.

Do note that unless your hl parameters default to including the "text"
field (which they do from your example) you wouldn't get anything.

Plus if you include "=query" on the URL, the results can
sometimes shed light on what's actually happening as opposed to what
you expect ;)

Best,
Erick


On Wed, Feb 1, 2017 at 12:23 PM, Teague James  wrote:
> Hi Erick! Thanks for the reply. The goal is to get two character terms like 
> 1a, 1b, 2a, 2b, 3a, etc. to get highlighted in the documents. Additional 
> testing shows that any alpha-numeric combo returns a blank highlight, 
> regardless of length. Thus, "pr0blem" will not highlight because of the zero 
> in the middle of the term.
>
> I came across a ServerFault article where it was suggested that the fieldType 
> must be tokenized in order for highlighting to work correctly. Setting the 
> field type to text_general was suggested as a solution. In my case the data 
> is stored as a string fieldType, which is then copied using copyField to a 
> field that has a fieldType of text_general, but I'm still not getting a good 
> highlight on terms like "1a". Highlighting works for any other 
> non-alpha-numeric term though.
>
> Other articles pointed to termVectors and termOffsets, but none of these 
> seemed to help. Here's  my config:
>
>  termPositions="true" termVectors="true" termOffsets="true" />
>  multiValued="true"/>
> 
>
>  positionIncrementGap="100">
> 
> 
>  words="stopwords.txt" />
>  catenateAll="1" preserveOriginal="1" generateNumberParts="0" 
> generateWordParts="0" />
>  synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
> 
> 
> 
>  catenateAll="1" preserveOriginal="1" generateNumberParts="0" 
> generateWordParts="0" />
>  words="stopwords.txt" />
> 
> 
> 
> 
> 
>
> In the solrconfig file highlighting is set to use the text field:  name="hl.fl">text
>
> Thoughts?
>
> Appreciate the help! Thanks!
>
> -Teague
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, February 1, 2017 2:49 PM
> To: solr-user 
> Subject: Re: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos
>
> How far into the text field are these tokens? The highlighter defaults to the 
> first 10K characters under control of hl.maxAnalyzedChars. It's vaguely 
> possible that the values happen to be farther along in the text than that. 
> Not likely, mind you but possible.
>
> Best,
> Erick
>
> On Wed, Feb 1, 2017 at 8:24 AM, Teague James  wrote:
>> Hello everyone! I'm still stuck on this issue and could really use
>> some help. I have a Solr 6.0.0 instance that is storing documents
>> peppered with text like "1a", "2e", "4c", etc. If I search the
>> documents for a word, "ms", "in", "the", etc., I get the correct
>> number of hits and the results are highlighted correctly in the
>> highlighting section. But when I search for "1a" or "2e" I get hits,
>> but the highlights are blank. Further testing revealed that the
>> highlighter fails to highlight any combination of alpha-numeric two 
>> character value, such a n0, b1, 1z, etc.:
>>  ...
>> 
>> 
>>
>> Where "8667" is the document ID of the record that had the hit, but no
>> highlight. Other searches, "ms" for example, return:
>>  ...
>> 
>>  
>>   
>>
>> MS
>>
>>   
>>  
>> 
>>
>> Why does highlighting fail for "1a" type searches? Any help is appreciated!
>> Thanks!
>>
>> -Teague James
>>
>


Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

2017-02-01 Thread David Kramer


Some background:
· The data involved is catalog data, with three nested objects: 
Products, Items, and Skus, in that order. We have a docType field on each 
record as a differentiator.
· The "id" field in our data is unique within datatype, but not across 
datatypes. We added a "uuid" field in our program that generates the Solr 
import file that is the id prefixed by the first letter of the docType, like 
P12345. That makes the uuid field unique, and we have that as the uniqueKey in 
our schema.xml.
· We are trying to retrieve the parent Product, and all children 
documents. As such, we are using the ChildDocTransformerFactory ([child...]) to 
retrieve the children along with the parent. We have not yet solved the problem 
of getting items within SKUs as nested documents in the results, and we will 
have to figure that out at some point, but for now we get them flattened
· We are building out the proof of concept for this. This is all new 
work, so we are free to change a lot.
· This is Solr 6.0.0, and we are importing in JSON format, if that 
matters
· I submitted this question to 
StackOverflow
 but haven’t gotten any answers yet.


Our data looks like this (I've removed some fields for simplicity):

{

  "id": 739063,

  "docType": "Product",

  "uuid": "P739063",

  "_childDocuments_": [

{

  "id": 1537378,

  "price": 25.45,

  "color": "Blush",

  "docType": "Item",

  "productId": 739063,

  "uuid": "I1537378",

  "_childDocuments_": [

{

  "id": 12799578,

  "size": "10",

  "width": "W",

  "docType": "Sku",

  "itemId": 1537378,

  "uuid": "S12799578"

}

  ]

}

}



The query to fetch all Products and their children nested inside them is 
q=docType:Product=title,id,docType,[child parentFilter=docType:Product]. 
When I run that query, all is well, and it returns the first 10 rows. However, 
if I fetch more rows by adding, say =500, we get the error Parent query 
yields document which is not matched by parents filter, docID=XXX.

When we first saw that error, we discovered our id field was not unique across 
document types, so we added the uuid field as mentioned above, which is. we 
also added in our schema.xml file, wiped the core, recreated it, and restarted 
Solr just to make sure it was in effect. We have double checked and are sure 
that the uuid fields are unique.



In all the search results for that error that I've found, the OP did not have a 
field that could differentiate the different document types, but as you see we 
do. Since both the query and the parentFilter are searching for docType:Product 
I don't see how either could possibly return anything but parents. We've also 
tried adding childFilter=docType:Item and childFilter=docType:Sku but that did 
not help.  I also tried using title:* for the filter since only products have 
titles.



Is there anything else we can try?

Any explanation of this?

Is it possible that it's not using uuid as the unique identifier even though 
it's specified in the schema.xml, and would that even cause this?

Thanks.




RE: Need help in Tika on SolrCloud

2017-02-01 Thread Anatharaman, Srinatha (Contractor)
Is there anyone to help me with my issue?

Your help is much appreciated



I figured out the problem but need solution

In my below data-config file tikaConfig.xml is not recognized by zookeeper (
   processor="TikaEntityProcessor" tikaConfig="tikaConfig.xml" )





















































-Original Message-
From: Anatharaman, Srinatha (Contractor) 
[mailto:srinatha_ananthara...@comcast.com]
Sent: Wednesday, February 01, 2017 11:51 AM
To: solr-user@lucene.apache.org
Subject: RE: Need help in Tika on SolrCloud



Hi All,



I see below code which is causing my code not to work in SolrCloud

  @Override





  public String getConfigDir() {





throw new ZooKeeperException(





ErrorCode.SERVER_ERROR,





"ZkSolrResourceLoader does not support getConfigDir() - likely, what 
you are trying to do is not supported in ZooKeeper mode");





  }







https://github.com/apache/lucene-solr/blob/branch_6_3/solr/core/src/java/org/apache/solr/cloud/ZkSolrResourceLoader.java



Can someone help me with work around



ERROR :

2017-02-01 16:39:55.932 ERROR (Thread-20) [c:dsearch s:shard2 r:core_node3 
x:dsearch_shard2_replica2] o.a.s.h.d.DataImporter Full Import 
failed:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1

at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)

at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)

at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)

at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:458)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1

at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)

at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)

at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)

... 4 more

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to load Tika Config Processing Document # 1

at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)

at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:96)

at 
org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:60)

at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.init(TikaEntityProcessor.java:76)

at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:75)

at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:433)

at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)

at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)

... 6 more

Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are 
trying to do is not supported in ZooKeeper mode

at 
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:151)

at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:91)

... 12 more





Thanks,

~Sri



From: Anatharaman, Srinatha (Contractor)

Sent: Wednesday, February 01, 2017 10:04 AM

To: 'solr-user@lucene.apache.org' 
>

Subject: Need help in Tika on SolrCloud



Hi,



I am new to Solr, I have implemented Solr on single node and my code is working 
good When I move the same code to SolrCloud it fails (I made few changes for 
SolrCloud)



I am trying to load data using Dataimporthandler but it throws error as below



2017-02-01 03:23:07.727 ERROR (Thread-18) [c:dsearch s:shard2 r:core_node1 
x:dsearch_shard2_replica1] o.a.s.h.d.DataImporter Full Import 
failed:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1

at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)

at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)

at 

RE: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

2017-02-01 Thread Teague James
Hi Erick! Thanks for the reply. The goal is to get two character terms like 1a, 
1b, 2a, 2b, 3a, etc. to get highlighted in the documents. Additional testing 
shows that any alpha-numeric combo returns a blank highlight, regardless of 
length. Thus, "pr0blem" will not highlight because of the zero in the middle of 
the term.

I came across a ServerFault article where it was suggested that the fieldType 
must be tokenized in order for highlighting to work correctly. Setting the 
field type to text_general was suggested as a solution. In my case the data is 
stored as a string fieldType, which is then copied using copyField to a field 
that has a fieldType of text_general, but I'm still not getting a good 
highlight on terms like "1a". Highlighting works for any other 
non-alpha-numeric term though.

Other articles pointed to termVectors and termOffsets, but none of these seemed 
to help. Here's  my config:

























In the solrconfig file highlighting is set to use the text field: text 

Thoughts?

Appreciate the help! Thanks!

-Teague

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, February 1, 2017 2:49 PM
To: solr-user 
Subject: Re: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

How far into the text field are these tokens? The highlighter defaults to the 
first 10K characters under control of hl.maxAnalyzedChars. It's vaguely 
possible that the values happen to be farther along in the text than that. Not 
likely, mind you but possible.

Best,
Erick

On Wed, Feb 1, 2017 at 8:24 AM, Teague James  wrote:
> Hello everyone! I'm still stuck on this issue and could really use 
> some help. I have a Solr 6.0.0 instance that is storing documents 
> peppered with text like "1a", "2e", "4c", etc. If I search the 
> documents for a word, "ms", "in", "the", etc., I get the correct 
> number of hits and the results are highlighted correctly in the 
> highlighting section. But when I search for "1a" or "2e" I get hits, 
> but the highlights are blank. Further testing revealed that the 
> highlighter fails to highlight any combination of alpha-numeric two character 
> value, such a n0, b1, 1z, etc.:
>  ...
> 
> 
>
> Where "8667" is the document ID of the record that had the hit, but no 
> highlight. Other searches, "ms" for example, return:
>  ...
> 
>  
>   
>
> MS
>
>   
>  
> 
>
> Why does highlighting fail for "1a" type searches? Any help is appreciated!
> Thanks!
>
> -Teague James
>



Re: How to combine third party search data as top results ?

2017-02-01 Thread shamik
Charlie, thanks for sharing the information. I'm going to take a look and get
back to you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-combine-third-party-search-data-as-top-results-tp4318116p4318349.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

2017-02-01 Thread Erick Erickson
How far into the text field are these tokens? The highlighter defaults
to the first 10K characters under control of hl.maxAnalyzedChars. It's
vaguely possible that the values happen to be farther along in the
text than that. Not likely, mind you but possible.

Best,
Erick

On Wed, Feb 1, 2017 at 8:24 AM, Teague James  wrote:
> Hello everyone! I'm still stuck on this issue and could really use some
> help. I have a Solr 6.0.0 instance that is storing documents peppered with
> text like "1a", "2e", "4c", etc. If I search the documents for a word, "ms",
> "in", "the", etc., I get the correct number of hits and the results are
> highlighted correctly in the highlighting section. But when I search for
> "1a" or "2e" I get hits, but the highlights are blank. Further testing
> revealed that the highlighter fails to highlight any combination of
> alpha-numeric two character value, such a n0, b1, 1z, etc.:
> 
> ...
> 
> 
>
> Where "8667" is the document ID of the record that had the hit, but no
> highlight. Other searches, "ms" for example, return:
> 
> ...
> 
>  
>   
>
> MS
>
>   
>  
> 
>
> Why does highlighting fail for "1a" type searches? Any help is appreciated!
> Thanks!
>
> -Teague James
>


Re: Do long auto commit times interfere with delete?

2017-02-01 Thread Erick Erickson
This should work fine. There is no requirement that a commit happen
between updates and deletes for the same document. That said, this can
be tricky so if you have a demonstrable case where this isn't so let
us know.

Best,
Erick


On Wed, Feb 1, 2017 at 9:53 AM, Hans Zhou  wrote:
> We have a solr cloud with a pretty long auto commit time (30 seconds for a 
> hard commit, 180 seconds for a soft commit).
>
> We’re also doing lots of delete-then-insert operations.
> i.e. Each document has a product.id, and to update a product, we do an update 
> request with
>
> {
> "delete": {
> "query": "product.id:(33624518)"
> },
> "add": [
> {
> "id": "product.33624518”,
> //… etc etc
> }
> ]
> }
>
> If we do multiple back-to-back index operations inside an auto-commit time 
> window, will the delete query fail to pick up the previously indexed 
> documents?
>
> 
>
> WGSN is a global foresight business. Our experts provide deep insight and 
> analysis of consumer, fashion and design trends. We inspire our clients to 
> plan and trade their range with unparalleled confidence and accuracy. 
> Together, we Create Tomorrow.
>
> WGSN is part of WGSN Limited, comprising of 
> market-leading products including WGSN.com, WGSN 
> Lifestyle & Interiors, WGSN 
> INstock, WGSN 
> StyleTrial and WGSN 
> Mindset, our bespoke 
> consultancy services.
>
> The information in or attached to this email is confidential and may be 
> legally privileged. If you are not the intended recipient of this message, 
> any use, disclosure, copying, distribution or any action taken in reliance on 
> it is prohibited and may be unlawful. If you have received this message in 
> error, please notify the sender immediately by return email and delete this 
> message and any copies from your computer and network. WGSN does not warrant 
> that this email and any attachments are free from viruses and accepts no 
> liability for any loss resulting from infected email transmissions.
>
> WGSN reserves the right to monitor all email through its networks. Any views 
> expressed may be those of the originator and not necessarily of WGSN. WGSN is 
> powered by Ascential plc, which transforms 
> knowledge businesses to deliver exceptional performance.
>
> Please be advised all phone calls may be recorded for training and quality 
> purposes and by accepting and/or making calls from and/or to us you 
> acknowledge and agree to calls being recorded.
>
> WGSN Limited, Company number 4858491
>
> registered address:
>
> Ascential plc, The Prow, 1 Wilder Walk, London W1B 5AP
>
> WGSN Inc., tax ID 04-3851246, registered office c/o National Registered 
> Agents, Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States
>
> 4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): 
> 15.536.968/0001-04, Address: Avenida Cidade Jardim, 377, 7˚ andar CEP 
> 01453-000, Itaim Bibi, São Paulo
>
> 4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询(上海)有限公司, 
> registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao 
> Road, Xuhui District, Shanghai


Re: Query.extractTerms dissapeared from 5.1.0 to 5.2.0

2017-02-01 Thread Max Bridgewater
Perfect. Thanks a lot.

On Wed, Feb 1, 2017 at 2:01 PM, Alan Woodward  wrote:

> Hi, extractTerms() is now on Weight rather than on Query.
>
> Alan
>
> > On 1 Feb 2017, at 17:43, Max Bridgewater 
> wrote:
> >
> > Hi,
> >
> > It seems Query.extractTerms() disapeared from 5.1.0 (
> > http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/
> search/Query.html)
> > to 5.2.0 (
> > http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/
> search/Query.html
> > ).
> >
> > However, I cannot find any comment on it in 5.2.0 release notes. Any
> > recommendation on what I should use in place of that method? I am
> migrating
> > some legacy code from Solr 4 to Solr 6.
> >
> > Thanks,
> > Max.
>
>


Re: Solr 6.3.0 - recovery failed

2017-02-01 Thread Joe Obernberger
I brought down the whole cluster again, and brought up one server at a 
time, waiting for it to go green before launching another. Now all 
replicas are OK, including the one that was in the perma-recovery mode 
before.  I do notice a large amount of network activity (basically 
pegging the interface) when a node is brought up.  I suspect this is 
especially true since these nodes are not dataNodes in HDFS.



-Joe


On 2/1/2017 1:37 PM, Alessandro Benedetti wrote:

I can't debug the code  now,  but if you access the logs,  directly ( not
from the ui),  is there any " caused by"  associated to the recovery
failure exception?
Cheers

On 1 Feb 2017 6:28 p.m., "Joe Obernberger" 
wrote:


In HDFS when a node fails it will leave behind write.lock files in HDFS.
These files have to be manually removed; otherwise the shards/replicas that
have write.lock files left behind will not start.  Since I can't tell which
physical node is hosting which shard/replica, I stop all the nodes, delete
all the write.lock files in HDFS and restart.

You are correct - only one replica is failing to start.  The other
replicas on the same physical node are coming up OK. Picture is worth a
thousand words so:
http://lovehorsepower.com/images/Cluster1.jpg

Errors:
http://lovehorsepower.com/images/ClusterSolr2.jpg

-Joe

On 2/1/2017 1:20 PM, Alessandro Benedetti wrote:


Ok,  it is clearer now.
You have 9 solr nodes running,  one per physical machine.
So each node has a number cores ( both replicas and leaders).
When the node died,  you got a lot of indexes corrupted.
I still miss why you restarted the others 8 working nodes ( I was
expecting
you to restart only the failed one)

When you mention that only one replica  is failing,  you mean that the
solr
node is up and running and only  one solr core ( the replica of one shard)
   keeps failing?
Or all the local cores in that node are failing  to recover?

Cheers

On 1 Feb 2017 6:07 p.m., "Joe Obernberger" 
wrote:

Thank you for the response.
There are no virtual machines in the configuration.  The collection has 45
shards with 3 replicas each spread across the 9 physical boxes; each box
is
running one copy of solr.  I've tried to restart just the one node after
the other 8 (and all their shards/replicas) came up, but this one replica
seems to be in perma-recovery.

Shard Count: 45
replicationFactor: 3
maxShardsPerNode: 50
router: compositeId
autoAddReplicas: false

SOLR_JAVA_MEM options are -Xms16g - Xmx32g

_TUNE is:
"-XX:+UseG1GC \
-XX:MaxDirectMemorySize=8g
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=32m \
-XX:MaxGCPauseMillis=500 \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:ParallelGCThreads=16 \
-XX:+UseLargePages \
-XX:-ResizePLAB \
-XX:+AggressiveOpts"

So far it has retried 22 times.  The cluster is accessible and OK, but I'm
afraid to continue indexing data if this one node will never come back.
Thanks for help!

-Joe



On 2/1/2017 12:58 PM, alessandro.benedetti wrote:

Let me try to summarize .

How many virtual machines on top of the 9 physical ?
How many Solr processes ( replicas ? )

If you had 1 node compromised.
I assume you have replicas as well right ?

Can you explain a little bit better your replicas configuration ?
Why you had to stop all the nodes ?

I would expect the stop of the solr node failing, cleanup of the index
and
restart.
Automatically it would recover from the leader.

Something is suspicious here, let us know !

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble
.com/Solr-6-3-0-recovery-failed-tp4318324p4318327.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: Query.extractTerms dissapeared from 5.1.0 to 5.2.0

2017-02-01 Thread Alan Woodward
Hi, extractTerms() is now on Weight rather than on Query.

Alan

> On 1 Feb 2017, at 17:43, Max Bridgewater  wrote:
> 
> Hi,
> 
> It seems Query.extractTerms() disapeared from 5.1.0 (
> http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/Query.html)
> to 5.2.0 (
> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Query.html
> ).
> 
> However, I cannot find any comment on it in 5.2.0 release notes. Any
> recommendation on what I should use in place of that method? I am migrating
> some legacy code from Solr 4 to Solr 6.
> 
> Thanks,
> Max.



Re: Solr 6.3.0 - recovery failed

2017-02-01 Thread Joe Obernberger

Thank you.  I do not see any caused block in the solr.log.

---

2017-02-01 18:37:57.566 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.c.RecoveryStrategy Replay not 
started, or was not successful... still buffering updates.
2017-02-01 18:37:57.566 ERROR 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.c.RecoveryStrategy Recovery 
failed - trying again... (50)
2017-02-01 18:37:57.566 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.c.RecoveryStrategy Wait [12.0] 
seconds before trying to recover again (attempt=51)
2017-02-01 18:38:57.567 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.c.RecoveryStrategy Begin 
buffering updates. core=[Worldline2New_shard22_replica2]
2017-02-01 18:38:57.567 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.u.UpdateLog Restarting 
buffering. previous=RecoveryInfo{adds=0 deletes=0 deleteByQuery=0 
errors=0 positionOfStart=0}
2017-02-01 18:38:57.567 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.u.UpdateLog Starting to buffer 
updates. HDFSUpdateLog{state=BUFFERING, tlog=null}
2017-02-01 18:38:57.567 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.c.RecoveryStrategy Publishing 
state of core [Worldline2New_shard22_replica2] as recovering, leader is 
[http://cordelia:9100/solr/Worldline2New_shard22_replica1/] and I am 
[http://bilbo:9100/solr/Worldline2New_shard22_replica2/]
2017-02-01 18:38:57.586 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.c.RecoveryStrategy Sending prep 
recovery command to [http://cordelia:9100/solr]; [WaitForState: 
action=PREPRECOVERY=Worldline2New_shard22_replica1=bilbo:9100_solr=core_node34=recovering=true=true=true]
2017-02-01 18:38:57.644 INFO 
(zkCallback-5-thread-49-processing-n:bilbo:9100_solr) [   ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged 
path:/collections/Worldline2New/state.json] for collection 
[Worldline2New] has occurred - updating... (live nodes size: [9])
2017-02-01 18:39:04.594 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.c.RecoveryStrategy Starting 
Replication Recovery.
2017-02-01 18:39:04.594 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.c.RecoveryStrategy Attempting to 
replicate from [http://cordelia:9100/solr/Worldline2New_shard22_replica1/].
2017-02-01 18:39:04.604 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.h.IndexFetcher Master's 
generation: 12398
2017-02-01 18:39:04.612 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.h.IndexFetcher Master's version: 
1485965089535
2017-02-01 18:39:04.612 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 
r:core_node34) [c:Worldline2New s:shard22 r:core_node34 
x:Worldline2New_shard22_replica2] o.a.s.h.IndexFetcher Slave's 
generation: 12358
2017-02-01 18:39:04.612 INFO 
(recoveryExecutor-3-thread-8-processing-n:bilbo:9100_solr 
x:Worldline2New_shard22_replica2 s:shard22 c:Worldline2New 

Re: Solr 6.3.0 - recovery failed

2017-02-01 Thread Alessandro Benedetti
I can't debug the code  now,  but if you access the logs,  directly ( not
from the ui),  is there any " caused by"  associated to the recovery
failure exception?
Cheers

On 1 Feb 2017 6:28 p.m., "Joe Obernberger" 
wrote:

> In HDFS when a node fails it will leave behind write.lock files in HDFS.
> These files have to be manually removed; otherwise the shards/replicas that
> have write.lock files left behind will not start.  Since I can't tell which
> physical node is hosting which shard/replica, I stop all the nodes, delete
> all the write.lock files in HDFS and restart.
>
> You are correct - only one replica is failing to start.  The other
> replicas on the same physical node are coming up OK. Picture is worth a
> thousand words so:
> http://lovehorsepower.com/images/Cluster1.jpg
>
> Errors:
> http://lovehorsepower.com/images/ClusterSolr2.jpg
>
> -Joe
>
> On 2/1/2017 1:20 PM, Alessandro Benedetti wrote:
>
>> Ok,  it is clearer now.
>> You have 9 solr nodes running,  one per physical machine.
>> So each node has a number cores ( both replicas and leaders).
>> When the node died,  you got a lot of indexes corrupted.
>> I still miss why you restarted the others 8 working nodes ( I was
>> expecting
>> you to restart only the failed one)
>>
>> When you mention that only one replica  is failing,  you mean that the
>> solr
>> node is up and running and only  one solr core ( the replica of one shard)
>>   keeps failing?
>> Or all the local cores in that node are failing  to recover?
>>
>> Cheers
>>
>> On 1 Feb 2017 6:07 p.m., "Joe Obernberger" 
>> wrote:
>>
>> Thank you for the response.
>> There are no virtual machines in the configuration.  The collection has 45
>> shards with 3 replicas each spread across the 9 physical boxes; each box
>> is
>> running one copy of solr.  I've tried to restart just the one node after
>> the other 8 (and all their shards/replicas) came up, but this one replica
>> seems to be in perma-recovery.
>>
>> Shard Count: 45
>> replicationFactor: 3
>> maxShardsPerNode: 50
>> router: compositeId
>> autoAddReplicas: false
>>
>> SOLR_JAVA_MEM options are -Xms16g - Xmx32g
>>
>> _TUNE is:
>> "-XX:+UseG1GC \
>> -XX:MaxDirectMemorySize=8g
>> -XX:+PerfDisableSharedMem \
>> -XX:+ParallelRefProcEnabled \
>> -XX:G1HeapRegionSize=32m \
>> -XX:MaxGCPauseMillis=500 \
>> -XX:InitiatingHeapOccupancyPercent=75 \
>> -XX:ParallelGCThreads=16 \
>> -XX:+UseLargePages \
>> -XX:-ResizePLAB \
>> -XX:+AggressiveOpts"
>>
>> So far it has retried 22 times.  The cluster is accessible and OK, but I'm
>> afraid to continue indexing data if this one node will never come back.
>> Thanks for help!
>>
>> -Joe
>>
>>
>>
>> On 2/1/2017 12:58 PM, alessandro.benedetti wrote:
>>
>> Let me try to summarize .
>>> How many virtual machines on top of the 9 physical ?
>>> How many Solr processes ( replicas ? )
>>>
>>> If you had 1 node compromised.
>>> I assume you have replicas as well right ?
>>>
>>> Can you explain a little bit better your replicas configuration ?
>>> Why you had to stop all the nodes ?
>>>
>>> I would expect the stop of the solr node failing, cleanup of the index
>>> and
>>> restart.
>>> Automatically it would recover from the leader.
>>>
>>> Something is suspicious here, let us know !
>>>
>>> Cheers
>>>
>>>
>>>
>>> -
>>> ---
>>> Alessandro Benedetti
>>> Search Consultant, R Software Engineer, Director
>>> Sease Ltd. - www.sease.io
>>> --
>>> View this message in context: http://lucene.472066.n3.nabble
>>> .com/Solr-6-3-0-recovery-failed-tp4318324p4318327.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>


Re: Solr 6.3.0 - recovery failed

2017-02-01 Thread Joe Obernberger
In HDFS when a node fails it will leave behind write.lock files in 
HDFS.  These files have to be manually removed; otherwise the 
shards/replicas that have write.lock files left behind will not start.  
Since I can't tell which physical node is hosting which shard/replica, I 
stop all the nodes, delete all the write.lock files in HDFS and restart.


You are correct - only one replica is failing to start.  The other 
replicas on the same physical node are coming up OK. Picture is worth a 
thousand words so:

http://lovehorsepower.com/images/Cluster1.jpg

Errors:
http://lovehorsepower.com/images/ClusterSolr2.jpg

-Joe

On 2/1/2017 1:20 PM, Alessandro Benedetti wrote:

Ok,  it is clearer now.
You have 9 solr nodes running,  one per physical machine.
So each node has a number cores ( both replicas and leaders).
When the node died,  you got a lot of indexes corrupted.
I still miss why you restarted the others 8 working nodes ( I was expecting
you to restart only the failed one)

When you mention that only one replica  is failing,  you mean that the solr
node is up and running and only  one solr core ( the replica of one shard)
  keeps failing?
Or all the local cores in that node are failing  to recover?

Cheers

On 1 Feb 2017 6:07 p.m., "Joe Obernberger" 
wrote:

Thank you for the response.
There are no virtual machines in the configuration.  The collection has 45
shards with 3 replicas each spread across the 9 physical boxes; each box is
running one copy of solr.  I've tried to restart just the one node after
the other 8 (and all their shards/replicas) came up, but this one replica
seems to be in perma-recovery.

Shard Count: 45
replicationFactor: 3
maxShardsPerNode: 50
router: compositeId
autoAddReplicas: false

SOLR_JAVA_MEM options are -Xms16g - Xmx32g

_TUNE is:
"-XX:+UseG1GC \
-XX:MaxDirectMemorySize=8g
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=32m \
-XX:MaxGCPauseMillis=500 \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:ParallelGCThreads=16 \
-XX:+UseLargePages \
-XX:-ResizePLAB \
-XX:+AggressiveOpts"

So far it has retried 22 times.  The cluster is accessible and OK, but I'm
afraid to continue indexing data if this one node will never come back.
Thanks for help!

-Joe



On 2/1/2017 12:58 PM, alessandro.benedetti wrote:


Let me try to summarize .
How many virtual machines on top of the 9 physical ?
How many Solr processes ( replicas ? )

If you had 1 node compromised.
I assume you have replicas as well right ?

Can you explain a little bit better your replicas configuration ?
Why you had to stop all the nodes ?

I would expect the stop of the solr node failing, cleanup of the index and
restart.
Automatically it would recover from the leader.

Something is suspicious here, let us know !

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble
.com/Solr-6-3-0-recovery-failed-tp4318324p4318327.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Solr 6.3.0 - recovery failed

2017-02-01 Thread Alessandro Benedetti
Ok,  it is clearer now.
You have 9 solr nodes running,  one per physical machine.
So each node has a number cores ( both replicas and leaders).
When the node died,  you got a lot of indexes corrupted.
I still miss why you restarted the others 8 working nodes ( I was expecting
you to restart only the failed one)

When you mention that only one replica  is failing,  you mean that the solr
node is up and running and only  one solr core ( the replica of one shard)
 keeps failing?
Or all the local cores in that node are failing  to recover?

Cheers

On 1 Feb 2017 6:07 p.m., "Joe Obernberger" 
wrote:

Thank you for the response.
There are no virtual machines in the configuration.  The collection has 45
shards with 3 replicas each spread across the 9 physical boxes; each box is
running one copy of solr.  I've tried to restart just the one node after
the other 8 (and all their shards/replicas) came up, but this one replica
seems to be in perma-recovery.

Shard Count: 45
replicationFactor: 3
maxShardsPerNode: 50
router: compositeId
autoAddReplicas: false

SOLR_JAVA_MEM options are -Xms16g - Xmx32g

_TUNE is:
"-XX:+UseG1GC \
-XX:MaxDirectMemorySize=8g
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=32m \
-XX:MaxGCPauseMillis=500 \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:ParallelGCThreads=16 \
-XX:+UseLargePages \
-XX:-ResizePLAB \
-XX:+AggressiveOpts"

So far it has retried 22 times.  The cluster is accessible and OK, but I'm
afraid to continue indexing data if this one node will never come back.
Thanks for help!

-Joe



On 2/1/2017 12:58 PM, alessandro.benedetti wrote:

> Let me try to summarize .
> How many virtual machines on top of the 9 physical ?
> How many Solr processes ( replicas ? )
>
> If you had 1 node compromised.
> I assume you have replicas as well right ?
>
> Can you explain a little bit better your replicas configuration ?
> Why you had to stop all the nodes ?
>
> I would expect the stop of the solr node failing, cleanup of the index and
> restart.
> Automatically it would recover from the leader.
>
> Something is suspicious here, let us know !
>
> Cheers
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.nabble
> .com/Solr-6-3-0-recovery-failed-tp4318324p4318327.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr 6.3.0 - recovery failed

2017-02-01 Thread Joe Obernberger

Thank you for the response.
There are no virtual machines in the configuration.  The collection has 
45 shards with 3 replicas each spread across the 9 physical boxes; each 
box is running one copy of solr.  I've tried to restart just the one 
node after the other 8 (and all their shards/replicas) came up, but this 
one replica seems to be in perma-recovery.


Shard Count: 45
replicationFactor: 3
maxShardsPerNode: 50
router: compositeId
autoAddReplicas: false

SOLR_JAVA_MEM options are -Xms16g - Xmx32g

_TUNE is:
"-XX:+UseG1GC \
-XX:MaxDirectMemorySize=8g
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=32m \
-XX:MaxGCPauseMillis=500 \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:ParallelGCThreads=16 \
-XX:+UseLargePages \
-XX:-ResizePLAB \
-XX:+AggressiveOpts"

So far it has retried 22 times.  The cluster is accessible and OK, but 
I'm afraid to continue indexing data if this one node will never come back.

Thanks for help!

-Joe


On 2/1/2017 12:58 PM, alessandro.benedetti wrote:

Let me try to summarize .
How many virtual machines on top of the 9 physical ?
How many Solr processes ( replicas ? )

If you had 1 node compromised.
I assume you have replicas as well right ?

Can you explain a little bit better your replicas configuration ?
Why you had to stop all the nodes ?

I would expect the stop of the solr node failing, cleanup of the index and
restart.
Automatically it would recover from the leader.

Something is suspicious here, let us know !

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-3-0-recovery-failed-tp4318324p4318327.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr 6.3.0 - recovery failed

2017-02-01 Thread alessandro.benedetti
Let me try to summarize .
How many virtual machines on top of the 9 physical ?
How many Solr processes ( replicas ? )

If you had 1 node compromised.
I assume you have replicas as well right ?

Can you explain a little bit better your replicas configuration ?
Why you had to stop all the nodes ?

I would expect the stop of the solr node failing, cleanup of the index and
restart.
Automatically it would recover from the leader.

Something is suspicious here, let us know !

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-3-0-recovery-failed-tp4318324p4318327.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Kafka DIH

2017-02-01 Thread Susheel Kumar
Hello Joel,

This definitely seems like a good feature to add.  In fact I was also
looking to push data into Solr from Kafka and this can be good feature to
have.

I have created JIRA https://issues.apache.org/jira/browse/SOLR-10086 and
can contribute as well.

Thanks,
Susheel

On Tue, Jan 31, 2017 at 6:28 PM, John Bickerstaff 
wrote:

> I wrote a simple java microservice that did this about a year ago...  It
> was pretty simple - and the kafka topic served as a way to re-create my
> collection from scratch without hitting the database again in the event of
> the Solr servers going down.
>
> The code just read from Kafka topic one by one and shipped to Solr in
> batches of 500 (between commits)
>
> (It was a small data set, I was lucky that way)
>
> On Tue, Jan 31, 2017 at 3:41 PM, Joel Bernstein 
> wrote:
>
> > This would make a great Streaming Expression as well. If you're
> interested
> > in working on this I'll help out along the way. Here is an example Stream
> > that connects to a JDBC data source:
> >
> > https://github.com/apache/lucene-solr/blob/master/solr/
> > solrj/src/java/org/apache/solr/client/solrj/io/stream/JDBCStream.java
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Jan 31, 2017 at 12:07 PM, Mike Thomsen 
> > wrote:
> >
> > > Probably not, but writing your own little Java process to do it would
> be
> > > trivial with Kafka 0.9.X or 0.10.X. You can also look at the Confluent
> > > Platform as they have tons of connectors for Kafka to directly feed
> into
> > > other systems.
> > >
> > > On Mon, Jan 30, 2017 at 3:05 AM, Mahmoud Almokadem <
> > prog.mahm...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > Is there a way to get SolrCloud to pull data from a topic in Kafak
> > > > periodically using Dataimport Handler?
> > > >
> > > > Thanks
> > > > Mahmoud
> > >
> >
>


Do long auto commit times interfere with delete?

2017-02-01 Thread Hans Zhou
We have a solr cloud with a pretty long auto commit time (30 seconds for a hard 
commit, 180 seconds for a soft commit).

We’re also doing lots of delete-then-insert operations.
i.e. Each document has a product.id, and to update a product, we do an update 
request with

{
"delete": {
"query": "product.id:(33624518)"
},
"add": [
{
"id": "product.33624518”,
//… etc etc
}
]
}

If we do multiple back-to-back index operations inside an auto-commit time 
window, will the delete query fail to pick up the previously indexed documents?



WGSN is a global foresight business. Our experts provide deep insight and 
analysis of consumer, fashion and design trends. We inspire our clients to plan 
and trade their range with unparalleled confidence and accuracy. Together, we 
Create Tomorrow.

WGSN is part of WGSN Limited, comprising of 
market-leading products including WGSN.com, WGSN Lifestyle 
& Interiors, WGSN 
INstock, WGSN 
StyleTrial and WGSN 
Mindset, our bespoke consultancy 
services.

The information in or attached to this email is confidential and may be legally 
privileged. If you are not the intended recipient of this message, any use, 
disclosure, copying, distribution or any action taken in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please notify the sender immediately by return email and delete this message 
and any copies from your computer and network. WGSN does not warrant that this 
email and any attachments are free from viruses and accepts no liability for 
any loss resulting from infected email transmissions.

WGSN reserves the right to monitor all email through its networks. Any views 
expressed may be those of the originator and not necessarily of WGSN. WGSN is 
powered by Ascential plc, which transforms knowledge 
businesses to deliver exceptional performance.

Please be advised all phone calls may be recorded for training and quality 
purposes and by accepting and/or making calls from and/or to us you acknowledge 
and agree to calls being recorded.

WGSN Limited, Company number 4858491

registered address:

Ascential plc, The Prow, 1 Wilder Walk, London W1B 5AP

WGSN Inc., tax ID 04-3851246, registered office c/o National Registered Agents, 
Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States

4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): 
15.536.968/0001-04, Address: Avenida Cidade Jardim, 377, 7˚ andar CEP 
01453-000, Itaim Bibi, São Paulo

4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询(上海)有限公司, 
registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao 
Road, Xuhui District, Shanghai


Solr 6.3.0 - recovery failed

2017-02-01 Thread Joe Obernberger
Hi All - I had one node in a 45 shard cluster (9 physical machines) run 
out of memory.  I stopped all the nodes in the cluster and removed any 
lingering write.lock files from the OOM in HDFS.  All the nodes 
recovered except one replica of one shard that happens to be on the node 
that ran out of memory.  The error is:


Error while trying to recover:org.apache.solr.common.SolrException: 
Replication for recovery failed.
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:159)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:408)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Anything I can check?  The index is stored in HDFS.  It seems to keep 
looping retrying over and over.


Thank you!

-Joe



Query.extractTerms dissapeared from 5.1.0 to 5.2.0

2017-02-01 Thread Max Bridgewater
Hi,

It seems Query.extractTerms() disapeared from 5.1.0 (
http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/Query.html)
to 5.2.0 (
http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Query.html
).

However, I cannot find any comment on it in 5.2.0 release notes. Any
recommendation on what I should use in place of that method? I am migrating
some legacy code from Solr 4 to Solr 6.

Thanks,
Max.


Re: Arabic words search in solr

2017-02-01 Thread mohanmca01
Dear Steve,Thanks for investigating our problem. Our project is basically
business directory search platform, and we have more than 100+ K business
details information. I’m providing you some examples of Arabic words to
reproduce the problem. please find attached word file where i explained
everything along with screenshots. arabicSearch.docx
 
regarding upgrading to the latest version, our project is running on Java
1.7V, and if i need to upgrade then we have to upgrade Java, Application
Server JBoos, and etc. which is not that right time to do this activity at
all..!!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4318227.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: project related configsets need to be deployed in both data and solr install folders ?

2017-02-01 Thread Chris Hostetter

Based on your description of the problem, and the fact that explicitly 
setting configSetBaseDir in your solr.xml works, I suspect there is some 
sort of weird bug in how the "default" configSetBaseDir is determined in 
some diff code paths in Solr.

We should definitely file a jira issue tracking this -- i would have done 
so already, but to be completley honest: I got a little lost in reading 
your initial email as far as how you have solr installed, and how you are 
running it.

Can you please create a new "Solr" jira here...

https://issues.apache.org/jira/secure/CreateIssue!default.jspa

... and include the following details...

1) a description of how you installed solr - directory listings of 
the various dirs making it clear which files are where would be helpful.  

2) a copy of the solr.in.sh * solr.xml files for each "project"

3) an example of exactly how you start solr (ie: the exact bin/solr 
command you run)

4) the exact error you get when creating a collection depending on which 
directory does/doesn't contain a configset.

(Ideally: to make it really clear what's going on, it would be great if 
you could run this test with a "foo" configset in one dir, and a "bar" 
configset in the other dir, and show us what the diff error messages look 
like depending on which configset you try to use)

5) note how things change when you add configSetBaseDir



Does that make sense?





: Date: Tue, 31 Jan 2017 16:38:33 -0700 (MST)
: From: Renee Sun 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: project related configsets need to be deployed in both data and
: solr install folders ?
: 
: Thanks Erick!
: 
: I looked at solr twiki though if configSetBaseDir is not set, the default
: should be SOLR_HOME/configsets:
: 
: configSetBaseDir
: 
:   The directory under which configsets for solr cores can be found. 
Defaults
: to SOLR_HOME/configsets
: 
: and I do have my solr started with :
: 
: -Dsolr.solr.home=/myprojectdata/solr/data
: 
: I also deploy my config into:
: 
: /myprojectdata/solr/data/configsets/myproject_configs
: 
: anyways, looks like the default is not working?
: 
: I found this https://issues.apache.org/jira/browse/SOLR-6158, which seems to
: talk about the configSetBaseDir issue ...
: 
: I do set configSetBaseDir in solr.xml and it works now. Just wonder why the
: default wont work. Or I might did something else wrong.
: 
: 
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318163.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss
http://www.lucidworks.com/


RE: Need help in Tika on SolrCloud

2017-02-01 Thread Anatharaman, Srinatha (Contractor)
Hi All,

I see below code which is causing my code not to work in SolrCloud
  @Override


  public String getConfigDir() {


throw new ZooKeeperException(


ErrorCode.SERVER_ERROR,


"ZkSolrResourceLoader does not support getConfigDir() - likely, what 
you are trying to do is not supported in ZooKeeper mode");


  }



https://github.com/apache/lucene-solr/blob/branch_6_3/solr/core/src/java/org/apache/solr/cloud/ZkSolrResourceLoader.java

Can someone help me with work around

ERROR :
2017-02-01 16:39:55.932 ERROR (Thread-20) [c:dsearch s:shard2 r:core_node3 
x:dsearch_shard2_replica2] o.a.s.h.d.DataImporter Full Import 
failed:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:458)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 4 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to load Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:96)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:60)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.init(TikaEntityProcessor.java:76)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:75)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:433)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 6 more
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are 
trying to do is not supported in ZooKeeper mode
at 
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:151)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:91)
... 12 more


Thanks,
~Sri

From: Anatharaman, Srinatha (Contractor)
Sent: Wednesday, February 01, 2017 10:04 AM
To: 'solr-user@lucene.apache.org' 
Subject: Need help in Tika on SolrCloud

Hi,

I am new to Solr, I have implemented Solr on single node and my code is working 
good
When I move the same code to SolrCloud it fails (I made few changes for 
SolrCloud)

I am trying to load data using Dataimporthandler but it throws error as below

2017-02-01 03:23:07.727 ERROR (Thread-18) [c:dsearch s:shard2 r:core_node1 
x:dsearch_shard2_replica1] o.a.s.h.d.DataImporter Full Import 
failed:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:458)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 4 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to load Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at 

Re: DIH - Parent-Child-Problems - GrapQuery-Or-BlockJoin - Order with Orderlines

2017-02-01 Thread Mikhail Khludnev
Have you checked
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers
?

01 февр. 2017 г. 10:42 пользователь "Kent Iversen" 
написал:

> I'm a newbie to Solr and can't seem to get this to work, properly. Gonna
> use Order with Orderlines as an example.
> And I'm using Solr 6.3.0
>
>
> ORDER
> 
> | orderid  |  customername |
> 
> |  1  | ACME|
> 
> |  2  | LOREM IPSUM |
> 
>
> ORDERLINE
> 
> | orderlineid | orderid  |  productname | quantity |
> 
> |  1   | 1   |   KNIFE  |  10 |
> 
> |  2   | 1   |   FORK  |   15 |
> 
> |  3   | 1   |   SPOON   |   12 |
> 
> |  4   | 2   |   FORK  |   25 |
> 
>
> Now, I would like to return the customer name and the products and quantity
> when querying for either the customer name or a product name.
>
> 1)
> QUERY for customer name = "ACME"
> Wanted result (pseudo style):
>
> Parent: ACME
>
> Child:
> {productname:KNIFE},
> {quantity:10}
>
> Child:
> {productname:FORK},
> {quantity:15}
>
> Child:
> {productname:SPOON},
> {quantity:12}
>
> 2)
> QUERY for product name = "FORK"
> Wanted result (pseudo style):
>
> Parent: ACME
>
> Child:
> {productname : FORK},
> {quantity : 15}
>
>
> Parent: LOREM IPSUM
>
> Child:
> {productname : FORK},
> {quantity : 25}
>
>
> Actual response when querying for orderid:1 (in the queryeditor in Solr)
>
> (http://localhost:8983/solr/order/select?indent=on=orderid:1=json)
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "q":"orderid:1",
>   "indent":"on",
>   "wt":"json",
>   "_":"1485913597768"}},
>   "response":{"numFound":3,"start":0,"docs":[
>   {
> "orderlineid":1,
> "orderid":1,
> "quantity":10,
> "productname":"KNIFE"},
>   {
> "orderlineid":2,
> "orderid":1,
> "quantity":15,
> "productname":"FORK"},
>   {
> "orderlineid":3,
> "orderid":1,
> "quantity":12,
> "quantity":"SPOON"},
>   {
> "orderid":1,
> "customername":"ACME",
> "_version_":1558094660321148928}]
>   }}
>
> Don't see the _child_ marker here, so it doesn't seems like what I have
> seen other people get in the response.
>
> 1) What is wrong with my configuration? How should it be?
> There is a relationship between order and orderline, and that is the order
> id, which is defined as a uniqueKey in managed-schema.xml
> 2) How can I get the customer name, product name and quantity when
> searching either for customer name, or product name? For instance, which
> customer bought FORKS?
> 3) How to use or append either GrapQuery or BlockJoin to give me back the
> result I want?
>
> Please advice
>
> Regards
> Kent
>
>
>
> solrconfig.xml
> --
>
> 
>  dir="${solr.install.dir:../../../..}/contrib/dataimporthandler-extras/lib"
> regex=".*\.jar" />
>  regex="solr-dataimporthandler-\d.*\.jar" />
> ...
>
> 
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>   data-config.xml
> 
> 
>
>
>
> data-config.xml
> --
> 
>
>
>
>
> 
> 
>
> 
> 
>
>  query="select ol.orderlineid, ol.orderid, ol.productname,
> ol.quantity from ORDERLINE ol where ol.orderid =  ${order.orderid}">
>
> 
> 
> 
> 
> 
> 
> 
> 
>
>
> managed-data.xml
> --
>
> 
>
> 
>
>  stored="false"/>
>   stored="false" docValues="false" />
>   stored="false" multiValued="true"/>
>
> 
>   multiValued="false" />
>   multiValued="false" />
>   multiValued="false" />
>   multiValued="false" />
>  multiValued="false" />
>
>
>
> 
> 
>
> orderid
>
>  sortMissingLast="true" docValues="true" />
>  multiValued="true" docValues="true" />
>
> 
>  sortMissingLast="true"/>
>
>  sortMissingLast="true" multiValued="true"/>
>
> 
>
> 
>  precisionStep="0" positionIncrementGap="0"/>
>  precisionStep="0" positionIncrementGap="0"/>
>  precisionStep="0" positionIncrementGap="0"/>
>  precisionStep="0" positionIncrementGap="0"/>
>
>  precisionStep="0" 

RE: Return specific field from child documents.

2017-02-01 Thread Mikhail Khludnev
ok. let's add
emp.logParamsList=q,fl,rows,*row.id *
*And check logs for this request?*

31 янв. 2017 г. 14:21 пользователь "Preeti Bhat" 
написал:

Same result.

Thanks,
Preeti


-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org]
Sent: Tuesday, January 31, 2017 3:15 PM
To: solr-user
Subject: Re: Return specific field from child documents.

ok. what about emp.q={!child of=content_type:parent}{!terms f=id v=$row.id}

On Tue, Jan 31, 2017 at 11:22 AM, Preeti Bhat 
wrote:

> Hi Mikhail,
>
> Thanks for reply, but It doesn't seem to work.
>
> http://localhost:8984/solr/Contact/select?fl=id,
> FirstName,emp:[subquery]=10=email={!
> child%20of=content_type:parent}{!term%20f=contact_id%
> 20v=$row.id}=on={!parent%20which=%22content_
> type:parent%22}%20email:%22tempe...@tempest.com%22=
> json=true
>
> I am getting the below response.
>
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":0,
> "params":{
>   "emp.fl":"email",
>   "q":"{!parent which=\"content_type:parent\"} email:\"
> tempe...@tempest.com\"",
>   "emp.rows":"10",
>   "indent":"on",
>   "expandMacros":"true",
>   "fl":"id,FirstName,emp:[subquery]",
>   "emp.q":"{!child of=content_type:parent}{!term f=contact_id v=$
> row.id}",
>   "wt":"json"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"7888",
> "FirstName":"temptest"}]
>   }}
>
> Thanks and Regards,
> Preeti Bhat
>
> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: Tuesday, January 31, 2017 1:34 PM
> To: solr-user
> Subject: Re: Return specific field from child documents.
>
> it should be something like
> emp.q={!child of=content_type:parent}{!terms f=contact_id v=$row.id}
>
> On Tue, Jan 31, 2017 at 10:30 AM, Preeti Bhat
> 
> wrote:
>
> > Hi Mikhail,
> >
> > I was trying to restrict the data from child documents due to one of
> > the request of the client where they would need the specific fields
> > as
> output.
> > The below query gives me the firstName and Last Name as expected but
> > not the email which is part of the child document.
> > http://localhost:8984/solr/Contact/select?fl=FirstName,
> > LastName,email,%20[child%20parentFilter=content_type:
> > parent%20]=on={!parent%20which=%22content_
> > type:parent%22}%20email:%22tempe...@tempest.com%22=json
> >
> > I did try the sub query, but its returning only the parent document.
> > Not sure if I am missing something here.
> >
> > http://localhost:8984/solr/Contact/select??fl=id,
> > FirstName,email,emp:[subquery]=10=email
> > q={!term%20f=contact_id%20v=$row.id}=on=%7b!
> > parent%20which=%22content_type:parent%22%7d%20email:%
> > 22tempe...@tempest.com%22=json=true
> >
> > I am expecting the result to be something like this. Could you
> > please advise.
> > {
> > "FirstName":"temptest",
> > "LastName":"temper",
> > "_childDocuments_":[
> > { "email":["tempe...@tempest.com"]
> > },
> > {"email":["temper.tt...@tempt.com"]
> > }
> > ]
> > }
> >
> > Thanks and Regards,
> > Preeti Bhat
> >
> > -Original Message-
> > From: Mikhail Khludnev [mailto:m...@apache.org]
> > Sent: Monday, January 30, 2017 5:49 PM
> > To: solr-user
> > Subject: Re: Return specific field from child documents.
> >
> > Hello,
> >
> > You hardly get any gain limiting child field output. You can do that
> > with [subquery] result transformer.
> >
> > On Mon, Jan 30, 2017 at 11:09 AM, Preeti Bhat
> > 
> > wrote:
> >
> > > Hi All,
> > >
> > > I am trying out the nested documents concept for SOLR. I am able
> > > to show the specific field for the parent document like
> > > "FirstName","LastName" but I am not able to show the specific
> > > field in
> > fl for child.
> > >
> > > I would like to retrieve the email from the _childDocuments. Could
> > > someone please advise.
> > >
> > > Q=+FirstName:"etr4tr" {!parent which="content_type:parent"}
> > > Fl=FirstName,LastName,email, [child
> > > parentFilter=content_type:parent ]
> > >
> > >   "response":{"numFound":1,"start":0,"docs":[
> > >   {
> > > "FirstName":"etr4tr",
> > > "LastName":"wrer6t",
> > > "_childDocuments_":[
> > > {
> > >   "id":"3556|12",
> > >   "company_id":["12"],
> > >   "email":["ehrijw.e...@b.com"],
> > >   "isPrimary":["true"]},
> > > {
> > >   "id":"3556|45",
> > >   "company_id":["45"],
> > >   "email":["ehrijw.eer54...@ccc.com"]}]}]
> > >   }}
> > >
> > >
> > > Thanks and Regards,
> > > Preeti Bhat
> > >
> > >
> > >
> > > NOTICE TO RECIPIENTS: This communication may contain confidential
> > > and/or privileged information. If you are not the intended
> > > recipient (or have received this communication in error) please
> > > notify the sender and it-supp...@shoregrp.com immediately, and
> > > 

Re: Upserting doc fields from a SearchComponent

2017-02-01 Thread Charlie Hull

On 01/02/2017 15:55, Ugo Matrangolo wrote:

Hi Erick,

Personalizing a 'price' involves using an external service (luckily we
could cache most of the interactions) and it is accessed using a lib that
gets dropped in the Solr classpath.


Hi Ugo,

We built a Solr plugin 'XJoin', allowing you to use results from an 
external system with Solr. Here are two blog posts about it, the first 
may be relevant:

http://www.flax.co.uk/blog/2016/01/25/xjoin-solr-part-1-filtering-using-price-discount-data/
http://www.flax.co.uk/blog/2016/01/29/xjoin-solr-part-2-click-example/

Cheers

Charlie


What I need to do is this kind of flow:

1. Query (.. personalization params ...)
2. Find the initial search results (e.g. find all stuff 'converse')
3. From field values in search results docs (previously indexed and rarely
changing) I need to ask this 'personalizer' what is the actual price for
this item for this user in this moment
4. Add this as a normal doc field in search response somewhere (e.fg.
personalized_price)
5. Use stat/fq/personalized_price:[10 TO 99]

The idea is to plug a 'Pricer' search component in the query chain but I
was wondering if it was the best idea/practice before going all-in with
this approach.

Another option could be function queries and frange to do the filtering.

Hope this clarifies.

Best
Ugo

On Wed, Feb 1, 2017 at 3:44 PM, Erick Erickson 
wrote:


You need to be clear about what to do when. The [subquery], is
completely separate from _finding_ the top N docs. Your last
question is about finding the top N.

There are really two choices that spring to mind, depending on where
you keep your data about the user. Solr can't magically know that user
X wants a specific price range.

One choice would be to have the app layer contact wherever the data
is kept and add fq clauses.

Another is to keep the data in Solr somewhere and write a search component
that tacks this kind of clause on the incoming query.

Best,
Erick

On Wed, Feb 1, 2017 at 7:39 AM, Ugo Matrangolo 
wrote:

Hi,

tx for the speedy response.

What if I need to filter on the result matches ??

Example: I have a price I need to personalize per user/request and need
then to apply filter ranges on this personalized price (show only stuff

in

the 10$ - 99$ range).

WDYT ?

Best
Ugo

On Wed, Feb 1, 2017 at 3:34 PM, Erick Erickson 
wrote:


If the data is from another Solr instance, consider the [subquery]
Document Transformer here:
https://cwiki.apache.org/confluence/display/solr/
Transforming+Result+Documents#TransformingResultDocuments-[subquery]

More broadly, you can write a custom DocTransformer plugin do insert
anything you want in the output documents.

NOTE: DocTransformer only works on the top N docs being returned,
which is what I think you want. IOW, if rows=10 it only "sees" 10
documents even if numFound is millions.

Best,
Erick



On Wed, Feb 1, 2017 at 7:04 AM, Ugo Matrangolo <

ugo.matrang...@gmail.com>

wrote:

Hi,

I'm trying to write a SearchComponent that personalizes on the fly a

field

on all the docs resulting from the initial search query.

What I would like to do is to write a SearchComponent that intercepts

the

documents belonging to the result sets of a search query and upsert

one

or

more of their field values with infos retrieved from external

svc/logic.


I can't pre-compute all of these values because they are highly

dependant

on the user/context.

I was wondering if someone here already did something like this and,

more

broadly, has some experience in personalizing a search response in the

Solr

guts.

Best
Ugo









--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

2017-02-01 Thread Teague James
Hello everyone! I'm still stuck on this issue and could really use some
help. I have a Solr 6.0.0 instance that is storing documents peppered with
text like "1a", "2e", "4c", etc. If I search the documents for a word, "ms",
"in", "the", etc., I get the correct number of hits and the results are
highlighted correctly in the highlighting section. But when I search for
"1a" or "2e" I get hits, but the highlights are blank. Further testing
revealed that the highlighter fails to highlight any combination of
alpha-numeric two character value, such a n0, b1, 1z, etc.:

...



Where "8667" is the document ID of the record that had the hit, but no
highlight. Other searches, "ms" for example, return:

...

 
  
   
MS
   
  
 


Why does highlighting fail for "1a" type searches? Any help is appreciated!
Thanks!

-Teague James



Re: Streaming Expressions result-set fields not in order

2017-02-01 Thread Zheng Lin Edwin Yeo
Hi Joel,

Thanks for your reply.

I've created the jira about this, with the issue number SOLR-10085


Regards,
Edwin

On 28 January 2017 at 10:26, Joel Bernstein  wrote:

> The issue is that fields are held in HashMaps internally so field order is
> not maintained. The thinking behind this was that field order was not so
> important as Tuples are mainly accessed by keys. But I think it's worth
> looking into an approach for maintaining field order. Feel free to create
> jira about this issue and update this thread with the issue number.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Jan 25, 2017 at 9:59 AM, Zheng Lin Edwin Yeo  >
> wrote:
>
> > Hi,
> >
> > I'm trying out the Streaming Expressions in Solr 6.3.0.
> >
> > Currently, I'm facing the issue of not being able to get the fields in
> the
> > result-set to be displayed in the same order as what I put in the query.
> >
> > For example, when I execute this query:
> >
> >  http://localhost:8983/solr/collection1/stream?expr=facet(collection1,
> >   q="*:*",
> >   buckets="id,cost,quantity",
> >   bucketSorts="cost desc",
> >   bucketSizeLimit=100,
> >   sum(cost),
> >   sum(quantity),
> >   min(cost),
> >   min(quantity),
> >   max(cost),
> >   max(quantity),
> >   avg(cost),
> >   avg(quantity),
> >   count(*))=true
> >
> >
> > I get the following in the result-set.
> >
> >{
> >   "result-set":{"docs":[
> > {
> > "min(quantity)":12.21,
> > "avg(quantity)":12.21,
> > "sum(cost)":256.33,
> > "max(cost)":256.33,
> > "count(*)":1,
> > "min(cost)":256.33,
> > "cost":256.33,
> > "avg(cost)":256.33,
> > "quantity":12.21,
> > "id":"01",
> > "sum(quantity)":12.21,
> > "max(quantity)":12.21},
> > {
> > "EOF":true,
> > "RESPONSE_TIME":359}]}}
> >
> >
> > The fields are displayed randomly all over the place, instead of the
> order
> > sum, min, max, avg as in the query. Is there any way which I can do to
> the
> > fields in the result-set to be displayed in the same order as the query?
> >
> >
> > Regards,
> > Edwin
> >
>


Phrase Queries and Punctuation

2017-02-01 Thread alessandro.benedetti
Hi all,
I was just thinking about Phrase Queries and punctuation ( and in general
how to manage increment positions when such a sentence delimiter happens).

At the moment for multi valued fields we have the "increment position gap"
which allow to avoid phrase queries to span different values for the same
field.

In a single valued textual fields, we may have hundreds of different
sentences ( separated by punctuation).
Generally we don't want phrase queries to span different sentences so I
would expect a similar position increment behaviour.

A possible solution could be to have a tokenizer which is able to split
sentences ( a lot of approaches in NLP are already there to be used) and add
an incrementPositionGap between sentences as well ( < multi value increment
position gap).
A very naive solution would be to add the position increment whenever we
find a punctuation delimiter ( such in the standard tokenizer happens for
stop words.
I have not analysed the implementations in details yet,
At this stage I was just wondering if anyone has faced this problem with
Lucene and Solr ?
Which kind of side effects could happen if we add the increment position gap
on a punctuation delimiter basis, by default on the Standard Tokenizer ?

Cheers




-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Phrase-Queries-and-Punctuation-tp4318290.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upserting doc fields from a SearchComponent

2017-02-01 Thread Ugo Matrangolo
Hi Erick,

Personalizing a 'price' involves using an external service (luckily we
could cache most of the interactions) and it is accessed using a lib that
gets dropped in the Solr classpath.

What I need to do is this kind of flow:

1. Query (.. personalization params ...)
2. Find the initial search results (e.g. find all stuff 'converse')
3. From field values in search results docs (previously indexed and rarely
changing) I need to ask this 'personalizer' what is the actual price for
this item for this user in this moment
4. Add this as a normal doc field in search response somewhere (e.fg.
personalized_price)
5. Use stat/fq/personalized_price:[10 TO 99]

The idea is to plug a 'Pricer' search component in the query chain but I
was wondering if it was the best idea/practice before going all-in with
this approach.

Another option could be function queries and frange to do the filtering.

Hope this clarifies.

Best
Ugo

On Wed, Feb 1, 2017 at 3:44 PM, Erick Erickson 
wrote:

> You need to be clear about what to do when. The [subquery], is
> completely separate from _finding_ the top N docs. Your last
> question is about finding the top N.
>
> There are really two choices that spring to mind, depending on where
> you keep your data about the user. Solr can't magically know that user
> X wants a specific price range.
>
> One choice would be to have the app layer contact wherever the data
> is kept and add fq clauses.
>
> Another is to keep the data in Solr somewhere and write a search component
> that tacks this kind of clause on the incoming query.
>
> Best,
> Erick
>
> On Wed, Feb 1, 2017 at 7:39 AM, Ugo Matrangolo 
> wrote:
> > Hi,
> >
> > tx for the speedy response.
> >
> > What if I need to filter on the result matches ??
> >
> > Example: I have a price I need to personalize per user/request and need
> > then to apply filter ranges on this personalized price (show only stuff
> in
> > the 10$ - 99$ range).
> >
> > WDYT ?
> >
> > Best
> > Ugo
> >
> > On Wed, Feb 1, 2017 at 3:34 PM, Erick Erickson 
> > wrote:
> >
> >> If the data is from another Solr instance, consider the [subquery]
> >> Document Transformer here:
> >> https://cwiki.apache.org/confluence/display/solr/
> >> Transforming+Result+Documents#TransformingResultDocuments-[subquery]
> >>
> >> More broadly, you can write a custom DocTransformer plugin do insert
> >> anything you want in the output documents.
> >>
> >> NOTE: DocTransformer only works on the top N docs being returned,
> >> which is what I think you want. IOW, if rows=10 it only "sees" 10
> >> documents even if numFound is millions.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>
> >> On Wed, Feb 1, 2017 at 7:04 AM, Ugo Matrangolo <
> ugo.matrang...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I'm trying to write a SearchComponent that personalizes on the fly a
> >> field
> >> > on all the docs resulting from the initial search query.
> >> >
> >> > What I would like to do is to write a SearchComponent that intercepts
> the
> >> > documents belonging to the result sets of a search query and upsert
> one
> >> or
> >> > more of their field values with infos retrieved from external
> svc/logic.
> >> >
> >> > I can't pre-compute all of these values because they are highly
> dependant
> >> > on the user/context.
> >> >
> >> > I was wondering if someone here already did something like this and,
> more
> >> > broadly, has some experience in personalizing a search response in the
> >> Solr
> >> > guts.
> >> >
> >> > Best
> >> > Ugo
> >>
>


RE: Collection will not replicate

2017-02-01 Thread Anatharaman, Srinatha (Contractor)
Erick,

Thank you for your quick response, I appreciate your help
I am new to Solr and not from Java background

I have developed the code in Dev on single core and it works perfectly good
On QA box I have installed Solr 6.3 on 2 nodes(SolrCloud)

Made few changes to the code to fit SolrCloud
I have very less time to complete this job in QA
Ultimately I need to load these files on realtime, I am thinking of using 
flume/Kafka for that
Just to show the progress I need to complete loading these emails text files 
using dataimporthandler
Herewith I am attaching my code, Please suggest me what could be the issue

Regards,
~Sri


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, February 01, 2017 10:39 AM
To: solr-user 
Subject: Re: Collection will not replicate

What version of Solr? since 5.4 there's been a FORCELEADER colelctions API call 
that might help.

I'd run it with the newly added replicas offline. you only want it to have good 
replicas to choose from.

Best,
Erick

On Wed, Feb 1, 2017 at 6:48 AM, tedsolr  wrote:
> Update! I did find an error:
>
> 2017-02-01 09:23:22.673 ERROR org.apache.solr.common.SolrException
> :org.apache.solr.common.SolrException: Error getting leader from zk 
> for shard shard1 
> Caused by: org.apache.solr.common.SolrException: Could not get leader props
> at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1040)
> at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1004)
> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:960)
> ... 14 more
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /collections/colname/leaders/shard1
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>
> When I view the cluster status I see that this shard does not have a leader.
> So it appears I need to force the leader designation to the "active"
> replica. How do I do that?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp431
> 8260p4318265.html Sent from the Solr - User mailing list archive at 
> Nabble.com.




  
  


  application/xml
  image/svg+xml
  text/xml
  application/x-google-gadget



  application/excel
  application/xls
  application/msworddoc
  application/msworddot
  application/powerpoint
  application/ppt
  
  application/x-tika-msoffice
  application/msword
  application/vnd.ms-excel
  application/vnd.ms-excel.sheet.binary.macroenabled.12
  application/vnd.ms-powerpoint
  application/vnd.visio
  application/vnd.ms-outlook



  application/x-tika-ooxml
  application/vnd.openxmlformats-package.core-properties+xml
  application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  application/vnd.openxmlformats-officedocument.spreadsheetml.template
  application/vnd.ms-excel.sheet.macroenabled.12
  application/vnd.ms-excel.template.macroenabled.12
  application/vnd.ms-excel.addin.macroenabled.12
  application/vnd.openxmlformats-officedocument.presentationml.presentation
  application/vnd.openxmlformats-officedocument.presentationml.template
  application/vnd.openxmlformats-officedocument.presentationml.slideshow
  application/vnd.ms-powerpoint.presentation.macroenabled.12
  application/vnd.ms-powerpoint.slideshow.macroenabled.12
  application/vnd.ms-powerpoint.addin.macroenabled.12
  application/vnd.openxmlformats-officedocument.wordprocessingml.document
  application/vnd.openxmlformats-officedocument.wordprocessingml.template
  application/vnd.ms-word.document.macroenabled.12
  application/vnd.ms-word.template.macroenabled.12



  text/html



  application/rtf



  application/pdf



  text/plain
  script/groovy
  application/x-groovy
  application/x-javascript
  application/javascript
  text/javascript




  application/vnd.oasis.opendocument.database

  application/vnd.sun.xml.writer
  application/vnd.oasis.opendocument.text
  application/vnd.oasis.opendocument.graphics
  application/vnd.oasis.opendocument.presentation
  application/vnd.oasis.opendocument.spreadsheet
  application/vnd.oasis.opendocument.chart
  application/vnd.oasis.opendocument.image
  application/vnd.oasis.opendocument.formula
  application/vnd.oasis.opendocument.text-master
  application/vnd.oasis.opendocument.text-web
  application/vnd.oasis.opendocument.text-template
  application/vnd.oasis.opendocument.graphics-template
  application/vnd.oasis.opendocument.presentation-template
  application/vnd.oasis.opendocument.spreadsheet-template
  

Re: Upserting doc fields from a SearchComponent

2017-02-01 Thread Erick Erickson
You need to be clear about what to do when. The [subquery], is
completely separate from _finding_ the top N docs. Your last
question is about finding the top N.

There are really two choices that spring to mind, depending on where
you keep your data about the user. Solr can't magically know that user
X wants a specific price range.

One choice would be to have the app layer contact wherever the data
is kept and add fq clauses.

Another is to keep the data in Solr somewhere and write a search component
that tacks this kind of clause on the incoming query.

Best,
Erick

On Wed, Feb 1, 2017 at 7:39 AM, Ugo Matrangolo  wrote:
> Hi,
>
> tx for the speedy response.
>
> What if I need to filter on the result matches ??
>
> Example: I have a price I need to personalize per user/request and need
> then to apply filter ranges on this personalized price (show only stuff in
> the 10$ - 99$ range).
>
> WDYT ?
>
> Best
> Ugo
>
> On Wed, Feb 1, 2017 at 3:34 PM, Erick Erickson 
> wrote:
>
>> If the data is from another Solr instance, consider the [subquery]
>> Document Transformer here:
>> https://cwiki.apache.org/confluence/display/solr/
>> Transforming+Result+Documents#TransformingResultDocuments-[subquery]
>>
>> More broadly, you can write a custom DocTransformer plugin do insert
>> anything you want in the output documents.
>>
>> NOTE: DocTransformer only works on the top N docs being returned,
>> which is what I think you want. IOW, if rows=10 it only "sees" 10
>> documents even if numFound is millions.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Wed, Feb 1, 2017 at 7:04 AM, Ugo Matrangolo 
>> wrote:
>> > Hi,
>> >
>> > I'm trying to write a SearchComponent that personalizes on the fly a
>> field
>> > on all the docs resulting from the initial search query.
>> >
>> > What I would like to do is to write a SearchComponent that intercepts the
>> > documents belonging to the result sets of a search query and upsert one
>> or
>> > more of their field values with infos retrieved from external svc/logic.
>> >
>> > I can't pre-compute all of these values because they are highly dependant
>> > on the user/context.
>> >
>> > I was wondering if someone here already did something like this and, more
>> > broadly, has some experience in personalizing a search response in the
>> Solr
>> > guts.
>> >
>> > Best
>> > Ugo
>>


Re: Upserting doc fields from a SearchComponent

2017-02-01 Thread Ugo Matrangolo
Hi,

tx for the speedy response.

What if I need to filter on the result matches ??

Example: I have a price I need to personalize per user/request and need
then to apply filter ranges on this personalized price (show only stuff in
the 10$ - 99$ range).

WDYT ?

Best
Ugo

On Wed, Feb 1, 2017 at 3:34 PM, Erick Erickson 
wrote:

> If the data is from another Solr instance, consider the [subquery]
> Document Transformer here:
> https://cwiki.apache.org/confluence/display/solr/
> Transforming+Result+Documents#TransformingResultDocuments-[subquery]
>
> More broadly, you can write a custom DocTransformer plugin do insert
> anything you want in the output documents.
>
> NOTE: DocTransformer only works on the top N docs being returned,
> which is what I think you want. IOW, if rows=10 it only "sees" 10
> documents even if numFound is millions.
>
> Best,
> Erick
>
>
>
> On Wed, Feb 1, 2017 at 7:04 AM, Ugo Matrangolo 
> wrote:
> > Hi,
> >
> > I'm trying to write a SearchComponent that personalizes on the fly a
> field
> > on all the docs resulting from the initial search query.
> >
> > What I would like to do is to write a SearchComponent that intercepts the
> > documents belonging to the result sets of a search query and upsert one
> or
> > more of their field values with infos retrieved from external svc/logic.
> >
> > I can't pre-compute all of these values because they are highly dependant
> > on the user/context.
> >
> > I was wondering if someone here already did something like this and, more
> > broadly, has some experience in personalizing a search response in the
> Solr
> > guts.
> >
> > Best
> > Ugo
>


Re: Collection will not replicate

2017-02-01 Thread Erick Erickson
What version of Solr? since 5.4 there's been a FORCELEADER colelctions
API call that might help.

I'd run it with the newly added replicas offline. you only want it to
have good replicas to choose from.

Best,
Erick

On Wed, Feb 1, 2017 at 6:48 AM, tedsolr  wrote:
> Update! I did find an error:
>
> 2017-02-01 09:23:22.673 ERROR org.apache.solr.common.SolrException
> :org.apache.solr.common.SolrException: Error getting leader from zk for
> shard shard1
> 
> Caused by: org.apache.solr.common.SolrException: Could not get leader props
> at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1040)
> at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1004)
> at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:960)
> ... 14 more
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /collections/colname/leaders/shard1
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>
> When I view the cluster status I see that this shard does not have a leader.
> So it appears I need to force the leader designation to the "active"
> replica. How do I do that?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318265.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help in Tika on SolrCloud

2017-02-01 Thread Erick Erickson
Not quite sure. Are all the directories you expect the Tika docs to be
in available to the Solr node?

Sidestepping your question, I would be very reluctant to use Tika in
SolrCloud mode because you're then putting all of the heavy-duty
processing on nodes that also serve queries. I have the same
reservation about stand-alone FWIW, but it's worse in SolrCloud.

I strongly recommend you do the Tika parsing from a client and send
the resulting Solr doc to SorlCloud, I predict you'll eventually do
that anyway. Here's a skeletal program that does that in SolrJ:

https://lucidworks.com/2012/02/14/indexing-with-solrj/

It also shows indexing from a DB, but that's easy enough to rip out.

Best,
Erick

On Wed, Feb 1, 2017 at 7:04 AM, Anatharaman, Srinatha (Contractor)
 wrote:
> Hi,
>
>
>
> I am new to Solr, I have implemented Solr on single node and my code is
> working good
>
> When I move the same code to SolrCloud it fails (I made few changes for
> SolrCloud)
>
>
>
> I am trying to load data using Dataimporthandler but it throws error as
> below
>
>
>
> 2017-02-01 03:23:07.727 ERROR (Thread-18) [c:dsearch s:shard2 r:core_node1
> x:dsearch_shard2_replica1] o.a.s.h.d.DataImporter Full Import
> failed:java.lang.RuntimeException: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> load Tika Config Processing Document # 1
>
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
>
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
>
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
>
> at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:458)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> load Tika Config Processing Document # 1
>
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
>
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
>
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
>
> ... 4 more
>
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to load Tika Config Processing Document # 1
>
> at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
>
> at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:96)
>
> at
> org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:60)
>
> at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.init(TikaEntityProcessor.java:76)
>
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:75)
>
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:433)
>
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)
>
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
>
> ... 6 more
>
> Caused by: org.apache.solr.common.cloud.ZooKeeperException:
> ZkSolrResourceLoader does not support getConfigDir() - likely, what you are
> trying to do is not supported in ZooKeeper mode
>
> at
> org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:151)
>
> at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:91)
>
> ... 12 more
>
>
>
>
>
> I have attached the code for your reference
>
> Could you please help me with the solution
>
>
>
> Regards,
>
> ~Sri
>
>
>
>
>
>


Re: Upserting doc fields from a SearchComponent

2017-02-01 Thread Erick Erickson
If the data is from another Solr instance, consider the [subquery]
Document Transformer here:
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents#TransformingResultDocuments-[subquery]

More broadly, you can write a custom DocTransformer plugin do insert
anything you want in the output documents.

NOTE: DocTransformer only works on the top N docs being returned,
which is what I think you want. IOW, if rows=10 it only "sees" 10
documents even if numFound is millions.

Best,
Erick



On Wed, Feb 1, 2017 at 7:04 AM, Ugo Matrangolo  wrote:
> Hi,
>
> I'm trying to write a SearchComponent that personalizes on the fly a field
> on all the docs resulting from the initial search query.
>
> What I would like to do is to write a SearchComponent that intercepts the
> documents belonging to the result sets of a search query and upsert one or
> more of their field values with infos retrieved from external svc/logic.
>
> I can't pre-compute all of these values because they are highly dependant
> on the user/context.
>
> I was wondering if someone here already did something like this and, more
> broadly, has some experience in personalizing a search response in the Solr
> guts.
>
> Best
> Ugo


Upserting doc fields from a SearchComponent

2017-02-01 Thread Ugo Matrangolo
Hi,

I'm trying to write a SearchComponent that personalizes on the fly a field
on all the docs resulting from the initial search query.

What I would like to do is to write a SearchComponent that intercepts the
documents belonging to the result sets of a search query and upsert one or
more of their field values with infos retrieved from external svc/logic.

I can't pre-compute all of these values because they are highly dependant
on the user/context.

I was wondering if someone here already did something like this and, more
broadly, has some experience in personalizing a search response in the Solr
guts.

Best
Ugo


Re: 1:n relation and function queries

2017-02-01 Thread Mikhail Khludnev
Why you can't get score from child level and combining it on a parent level?

On Wed, Feb 1, 2017 at 5:33 PM, Ewald Moitzi  wrote:

> Hello Mikhail,
>
> I was using the functions as fl and sort parameters, and this
> gives no result.
>
> When sorting by score, as you did in your blog, I get the values
> from the child documents. (I missed the score=max parameter)
>
> However, i need to combine values from the parent and child, like
> this: {!func}sum(price_i, field_from_parent).
>
> At this point I get a SyntaxError like this:
> "org.apache.solr.search.SyntaxError: Expected ')' at position 32 in
> 'sum(shipping_cost_f,startup_cost'"
>
> But the ')' is there: sum(shipping_cost_f,startup_cost)
>
> Do I have the syntax wrong? Or is this function not allowed in this
> context?
>
> Thanks,
> Ewald
>
>
>
> On 2017-02-01 12:59, Mikhail Khludnev wrote:
> > Ewald,
> >
> > Functional queries combines well with block join as well as query time
> > join, here are examples for latter one
> > http://blog-archive.griddynamics.com/2015/08/
> scoring-join-party-in-solr-53.html
> > It must be the same for block join.
> > What doesn't work exactly?
> >
> > On Wed, Feb 1, 2017 at 1:39 PM, Ewald Moitzi <
> ewald.moi...@student.tugraz.at
> >> wrote:
> >
> >> Hello,
> >>
> >> I am unsure if solr is the right solution for a problem
> >> that we have, of if it is better to stick with a relational
> >> database (and if it should be done in solr how to implement it).
> >> The explanation is a bit lengthy, but please
> >> bear with me.
> >>
> >> The problem:
> >> Sort results of a vendor search for a product according to price
> >> including delivery costs.
> >>
> >> The data:
> >> The store itself is a marketplace, and each product can be
> >> supplied by different vendors. The vendors can define delivery
> >> costs for different price ranges.
> >> E.g:
> >>
> >>  _price from_| price to_|  delivery cost |
> >> | 0  |   49 |10  |
> >>   vendor  --|50  |   99 | 5  |
> >> |   100  |   max| 0  |
> >>
> >> So, for product with a price of 55, I want the result to be 60.
> >>
> >> Additional requirements:
> >>  - The product price is also calculated, based on properties
> >>of the vendor.
> >>  - There is also a pickup option, and there should be no
> >>duplicate results.
> >>  - Different shipping costs for different countries.
> >>
> >> Progress so far:
> >> My idea is to store each range as a subdocument for a vendor, but
> >> I don't know how to construct a query for that. So far I have
> >> managed to implement a simpler version that gives the right result for
> >> each country using dynamic fields, but this uses only a free delivery
> >> above x approach and that is not what we want.
> >>
> >> I have looked into the Block Join Query parser, but as far as I can
> >> tell this does not allow to construct a function query with inputs
> >> from parent and child documents.
> >>
> >> Why solr:
> >>  - sort and limit result according to geolocation.
> >>  - we will deploy solr anyhow in this project, for a classic
> >>full text search.
> >>
> >> As said above, I'm not really sure if this is a good application
> >> for solr, but the geolocation features are quite handy. And the
> >> query is not really fast in a relational db either.
> >>
> >> Any input is greatly appreciated.
> >>
> >> Regards,
> >> Ewald
> >>
> >>
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev


Need help in Tika on SolrCloud

2017-02-01 Thread Anatharaman, Srinatha (Contractor)
Hi,

I am new to Solr, I have implemented Solr on single node and my code is working 
good
When I move the same code to SolrCloud it fails (I made few changes for 
SolrCloud)

I am trying to load data using Dataimporthandler but it throws error as below

2017-02-01 03:23:07.727 ERROR (Thread-18) [c:dsearch s:shard2 r:core_node1 
x:dsearch_shard2_replica1] o.a.s.h.d.DataImporter Full Import 
failed:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:458)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 4 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to load Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:96)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:60)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.init(TikaEntityProcessor.java:76)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:75)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:433)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 6 more
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are 
trying to do is not supported in ZooKeeper mode
at 
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:151)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:91)
... 12 more


I have attached the code for your reference
Could you please help me with the solution

Regards,
~Sri






  
  


  application/xml
  image/svg+xml
  text/xml
  application/x-google-gadget



  application/excel
  application/xls
  application/msworddoc
  application/msworddot
  application/powerpoint
  application/ppt
  
  application/x-tika-msoffice
  application/msword
  application/vnd.ms-excel
  application/vnd.ms-excel.sheet.binary.macroenabled.12
  application/vnd.ms-powerpoint
  application/vnd.visio
  application/vnd.ms-outlook



  application/x-tika-ooxml
  application/vnd.openxmlformats-package.core-properties+xml
  application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  application/vnd.openxmlformats-officedocument.spreadsheetml.template
  application/vnd.ms-excel.sheet.macroenabled.12
  application/vnd.ms-excel.template.macroenabled.12
  application/vnd.ms-excel.addin.macroenabled.12
  application/vnd.openxmlformats-officedocument.presentationml.presentation
  application/vnd.openxmlformats-officedocument.presentationml.template
  application/vnd.openxmlformats-officedocument.presentationml.slideshow
  application/vnd.ms-powerpoint.presentation.macroenabled.12
  application/vnd.ms-powerpoint.slideshow.macroenabled.12
  application/vnd.ms-powerpoint.addin.macroenabled.12
  application/vnd.openxmlformats-officedocument.wordprocessingml.document
  application/vnd.openxmlformats-officedocument.wordprocessingml.template
  application/vnd.ms-word.document.macroenabled.12
  application/vnd.ms-word.template.macroenabled.12



  text/html



  application/rtf



  application/pdf



  text/plain
  script/groovy
  application/x-groovy
  application/x-javascript
  application/javascript
  text/javascript




  application/vnd.oasis.opendocument.database

  application/vnd.sun.xml.writer
  application/vnd.oasis.opendocument.text
  application/vnd.oasis.opendocument.graphics
  

Re: Collection will not replicate

2017-02-01 Thread tedsolr
Update! I did find an error: 

2017-02-01 09:23:22.673 ERROR org.apache.solr.common.SolrException
:org.apache.solr.common.SolrException: Error getting leader from zk for
shard shard1

Caused by: org.apache.solr.common.SolrException: Could not get leader props
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1040)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1004)
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:960)
... 14 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/colname/leaders/shard1
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)

When I view the cluster status I see that this shard does not have a leader.
So it appears I need to force the leader designation to the "active"
replica. How do I do that?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318265.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: 1:n relation and function queries

2017-02-01 Thread Ewald Moitzi
Hello Mikhail,

I was using the functions as fl and sort parameters, and this
gives no result.

When sorting by score, as you did in your blog, I get the values
from the child documents. (I missed the score=max parameter)

However, i need to combine values from the parent and child, like
this: {!func}sum(price_i, field_from_parent).

At this point I get a SyntaxError like this:
"org.apache.solr.search.SyntaxError: Expected ')' at position 32 in
'sum(shipping_cost_f,startup_cost'"

But the ')' is there: sum(shipping_cost_f,startup_cost)

Do I have the syntax wrong? Or is this function not allowed in this
context?

Thanks,
Ewald



On 2017-02-01 12:59, Mikhail Khludnev wrote:
> Ewald,
> 
> Functional queries combines well with block join as well as query time
> join, here are examples for latter one
> http://blog-archive.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
> It must be the same for block join.
> What doesn't work exactly?
> 
> On Wed, Feb 1, 2017 at 1:39 PM, Ewald Moitzi > wrote:
> 
>> Hello,
>>
>> I am unsure if solr is the right solution for a problem
>> that we have, of if it is better to stick with a relational
>> database (and if it should be done in solr how to implement it).
>> The explanation is a bit lengthy, but please
>> bear with me.
>>
>> The problem:
>> Sort results of a vendor search for a product according to price
>> including delivery costs.
>>
>> The data:
>> The store itself is a marketplace, and each product can be
>> supplied by different vendors. The vendors can define delivery
>> costs for different price ranges.
>> E.g:
>>
>>  _price from_| price to_|  delivery cost |
>> | 0  |   49 |10  |
>>   vendor  --|50  |   99 | 5  |
>> |   100  |   max| 0  |
>>
>> So, for product with a price of 55, I want the result to be 60.
>>
>> Additional requirements:
>>  - The product price is also calculated, based on properties
>>of the vendor.
>>  - There is also a pickup option, and there should be no
>>duplicate results.
>>  - Different shipping costs for different countries.
>>
>> Progress so far:
>> My idea is to store each range as a subdocument for a vendor, but
>> I don't know how to construct a query for that. So far I have
>> managed to implement a simpler version that gives the right result for
>> each country using dynamic fields, but this uses only a free delivery
>> above x approach and that is not what we want.
>>
>> I have looked into the Block Join Query parser, but as far as I can
>> tell this does not allow to construct a function query with inputs
>> from parent and child documents.
>>
>> Why solr:
>>  - sort and limit result according to geolocation.
>>  - we will deploy solr anyhow in this project, for a classic
>>full text search.
>>
>> As said above, I'm not really sure if this is a good application
>> for solr, but the geolocation features are quite handy. And the
>> query is not really fast in a relational db either.
>>
>> Any input is greatly appreciated.
>>
>> Regards,
>> Ewald
>>
>>
> 
> 


Collection will not replicate

2017-02-01 Thread tedsolr
I have a collection (1 shard, 2 replicas) that was doing a batch update when
one solr host ran out of disk space. The batch job failed at that point, and
one replica got corrupted. I deleted the bad replica. I've tried several
times since then to add a new replica. The status of the request is
"running" for about 30 minutes or so, then it completes but the new replica
is always "down" and has 0 documents.

The collection only has 15 million docs. Adding a replica to a shard that
small should only take a couple minutes. I haven't seen any errors in the
solr logs during the replication process. Has anyone seen this behavior
before? What should I be looking at for diagnostic purposes? 

Thanks for the support
v5.2.1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query structure

2017-02-01 Thread KRIS MUSSHORN
This was the solution. 
Thank you! 

- Original Message -

From: "Maciej Ł. PCSS"  
To: solr-user@lucene.apache.org 
Sent: Wednesday, February 1, 2017 7:57:05 AM 
Subject: Re: Query structure 

You should be able to put 'facetMetatagDatePrefix4:2015 OR 
facetMetatagDatePrefix4:2016' into the filtering query. 

Maciej 


W dniu 01.02.2017 o 13:43, KRIS MUSSHORN pisze: 
> I really need some guidance on this query structure issue. 
> 
> I've got to get this solved today for my employer. 
> 
> "Help me Obiwan. Your my only hope" 
> 
> K 
> 
> - Original Message - 
> 
> From: "KRIS MUSSHORN"  
> To: solr-user@lucene.apache.org 
> Sent: Tuesday, January 31, 2017 12:31:13 PM 
> Subject: Query structure 
> 
> I have a defaultSearchField and facetMetatagDatePrefix4 fields that are 
> correctly populated with values in SOLR 5.4.1. 
> 
> if execute this query q=defaultSearchField:this text 
> I get the 7 docs that match. 
> Their are three docs in 2015 and one doc in 2016 per the facet counts in the 
> results. 
> If I then q=defaultSearchField:this text AND facetMetatagDatePrefix4:2015 i 
> get the correct 3 documents. 
> 
> How would I structure my query to get defaultSearchField:this text AND 
> (facetMetatagDatePrefix4:2015 OR facetMetatagDatePrefix4:2016) and return 
> only 4 docs? 
> 
> TIA, 
> Kris 
> 
> 
> 




Re: Fw: solr-user-unsubscribe

2017-02-01 Thread alessandro.benedetti
Gents,
have you read the instructions ?
Have you sent an email to : solr-user-unsubscr...@lucene.apache.org ?

You don't need to send messages to the mailing list with that address as
content.
Just follow what's in the official Solr documentation page :

http://lucene.apache.org/solr/community.html#mailing-lists-irc

Thank you 



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-user-unsubscribe-tp4317823p4318255.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fw: solr-user-unsubscribe

2017-02-01 Thread Syed Mudasseer
Can someone help me with unsubscription of solr emails?

I tried sending "unsubscribe" emails to "solr-user@lucene.apache.org" but no 
luck.


Thanks,

Mudasseer


From: Syed Mudasseer 
Sent: Monday, January 30, 2017 12:55 PM
To: solr-user@lucene.apache.org
Subject: solr-user-unsubscribe




Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread alessandro.benedetti
What I meant is that :
"Components
If you define components, the default components (see above) will not be
executed, and first-components and last-components are disallowed:"

Anyway it is documented in the Confluence page.
If you don't override the default components, json facets will be there.

Related the converter, i don't think there's anything out there.
You will need to define the json facets as json.

Cheers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4318252.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query structure

2017-02-01 Thread Maciej Ł. PCSS
You should be able to put 'facetMetatagDatePrefix4:2015 OR 
facetMetatagDatePrefix4:2016' into the filtering query.


Maciej


W dniu 01.02.2017 o 13:43, KRIS MUSSHORN pisze:

I really need some guidance on this query structure issue.

I've got to get this solved today for my employer.

"Help me Obiwan. Your my only hope"

K

- Original Message -

From: "KRIS MUSSHORN" 
To: solr-user@lucene.apache.org
Sent: Tuesday, January 31, 2017 12:31:13 PM
Subject: Query structure

I have a defaultSearchField and facetMetatagDatePrefix4 fields that are 
correctly populated with values in SOLR 5.4.1.

if execute this query q=defaultSearchField:this text
I get the 7 docs that match.
Their are three docs in 2015 and one doc in 2016 per the facet counts in the 
results.
If I then q=defaultSearchField:this text AND facetMetatagDatePrefix4:2015 i get 
the correct 3 documents.

How would I structure my query to get defaultSearchField:this text AND 
(facetMetatagDatePrefix4:2015 OR facetMetatagDatePrefix4:2016) and return only 
4 docs?

TIA,
Kris







Re: Query structure

2017-02-01 Thread Maciej Ł. PCSS

Why not use filtering query? I mean the 'fq' param.

Regards
Maciej


W dniu 01.02.2017 o 13:43, KRIS MUSSHORN pisze:

I really need some guidance on this query structure issue.

I've got to get this solved today for my employer.

"Help me Obiwan. Your my only hope"

K

- Original Message -

From: "KRIS MUSSHORN" 
To: solr-user@lucene.apache.org
Sent: Tuesday, January 31, 2017 12:31:13 PM
Subject: Query structure

I have a defaultSearchField and facetMetatagDatePrefix4 fields that are 
correctly populated with values in SOLR 5.4.1.

if execute this query q=defaultSearchField:this text
I get the 7 docs that match.
Their are three docs in 2015 and one doc in 2016 per the facet counts in the 
results.
If I then q=defaultSearchField:this text AND facetMetatagDatePrefix4:2015 i get 
the correct 3 documents.

How would I structure my query to get defaultSearchField:this text AND 
(facetMetatagDatePrefix4:2015 OR facetMetatagDatePrefix4:2016) and return only 
4 docs?

TIA,
Kris







Re: Query structure

2017-02-01 Thread KRIS MUSSHORN
I really need some guidance on this query structure issue. 

I've got to get this solved today for my employer. 

"Help me Obiwan. Your my only hope" 

K 

- Original Message -

From: "KRIS MUSSHORN"  
To: solr-user@lucene.apache.org 
Sent: Tuesday, January 31, 2017 12:31:13 PM 
Subject: Query structure 

I have a defaultSearchField and facetMetatagDatePrefix4 fields that are 
correctly populated with values in SOLR 5.4.1. 

if execute this query q=defaultSearchField:this text 
I get the 7 docs that match. 
Their are three docs in 2015 and one doc in 2016 per the facet counts in the 
results. 
If I then q=defaultSearchField:this text AND facetMetatagDatePrefix4:2015 i get 
the correct 3 documents. 

How would I structure my query to get defaultSearchField:this text AND 
(facetMetatagDatePrefix4:2015 OR facetMetatagDatePrefix4:2016) and return only 
4 docs? 

TIA, 
Kris 




Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread SOLR4189
And still I have a question:
Is there some convertor from the legacy api to the new API?
Or a search component that converts from legacy api to json facet api?

I explained why I need it in my first post.

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4318241.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread SOLR4189
Alessandro, it helped! Thank you.
But I asked which changes do we do in configuration and I think these things
must be documented in the reference guide.
About your question, first of all I don't override default componets. Second
of all, I add my own components and for many reasons (For example, I checked
permissions before each query with my own component).




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4318240.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: 1:n relation and function queries

2017-02-01 Thread Mikhail Khludnev
Ewald,

Functional queries combines well with block join as well as query time
join, here are examples for latter one
http://blog-archive.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
It must be the same for block join.
What doesn't work exactly?

On Wed, Feb 1, 2017 at 1:39 PM, Ewald Moitzi  wrote:

> Hello,
>
> I am unsure if solr is the right solution for a problem
> that we have, of if it is better to stick with a relational
> database (and if it should be done in solr how to implement it).
> The explanation is a bit lengthy, but please
> bear with me.
>
> The problem:
> Sort results of a vendor search for a product according to price
> including delivery costs.
>
> The data:
> The store itself is a marketplace, and each product can be
> supplied by different vendors. The vendors can define delivery
> costs for different price ranges.
> E.g:
>
>  _price from_| price to_|  delivery cost |
> | 0  |   49 |10  |
>   vendor  --|50  |   99 | 5  |
> |   100  |   max| 0  |
>
> So, for product with a price of 55, I want the result to be 60.
>
> Additional requirements:
>  - The product price is also calculated, based on properties
>of the vendor.
>  - There is also a pickup option, and there should be no
>duplicate results.
>  - Different shipping costs for different countries.
>
> Progress so far:
> My idea is to store each range as a subdocument for a vendor, but
> I don't know how to construct a query for that. So far I have
> managed to implement a simpler version that gives the right result for
> each country using dynamic fields, but this uses only a free delivery
> above x approach and that is not what we want.
>
> I have looked into the Block Join Query parser, but as far as I can
> tell this does not allow to construct a function query with inputs
> from parent and child documents.
>
> Why solr:
>  - sort and limit result according to geolocation.
>  - we will deploy solr anyhow in this project, for a classic
>full text search.
>
> As said above, I'm not really sure if this is a good application
> for solr, but the geolocation features are quite handy. And the
> query is not really fast in a relational db either.
>
> Any input is greatly appreciated.
>
> Regards,
> Ewald
>
>


-- 
Sincerely yours
Mikhail Khludnev


Re: [Benchmark SOLR] JETTY VS TOMCAT - Jetty 15% slower - need advice to improve Jetty performance

2017-02-01 Thread Gerald Reinhart



We have done some profiling with Visualvm, but nothing obvious appeared.

Thank Rick for the advice.

Gérald Reinhart


On 02/01/2017 11:17 AM, Rick Leir wrote:

There is a profiling tool in Eclipse that can show you a tree of method
calls, with timing information. I have found this useful in the past to
investigate a performance problem. But it might not help if the problem
only occurs at 165 queries per second (is that true?).

cheers -- Rick


On 2017-01-30 04:02 AM, Gerald Reinhart wrote:

Hello,

 In addition to the following settings, we have tried to :
 - force Jetty to use more threads
 - put the same GC options as our Tomcat
 - change nb of acceptors and selectors

and every time Jetty is slower than Tomcat.

Any advice is welcome

Thanks,

Gérald Reinhart




On 01/27/2017 11:22 AM, Gerald Reinhart wrote:

Hello,

  We are migrating our platform
  from
   - Solr 5.4.1 hosted by a Tomcat
  to
   - Solr 5.4.1 standalone (hosted by Jetty)

=> Jetty is 15% slower than Tomcat in the same conditions.


 Here are details about the benchmarks :

 Context :
  - Index with 9 000 000 documents
  - Gatling launch queries extracted from the real traffic
  - Server :  R410 with 16 virtual CPU and 96G mem

 Results with 20 clients in // during 10 minutes:
  For Tomcat :
  - 165 Queries per seconds
  - 120ms mean response time

  For Jetty :
  - 139 Queries per seconds
  - 142ms mean response time

We have checked :
 - the load of the server => same
 - the io wait => same
 - the memory used in the JVM => same
 - JVM GC settings => same

For us, it's a blocker for the migration.

Is it a known issue ? (I found that :
http://www.asjava.com/jetty/jetty-vs-tomcat-performance-comparison/)

How can we improve the performance of Jetty ? (We have already
followed
http://www.eclipse.org/jetty/documentation/9.2.21.v20170120/optimizing.html

recommendation)

   Many thanks,


Gérald Reinhart


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à
l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
destinataire de ce message, merci de le détruire et d'en avertir
l'expéditeur.


--

Gérald Reinhart
Software Engineer



*E*gerald.reinh...@kelkoo.com
*A*Parc Sud Galaxie, 6, rue des Méridiens, 38130 Echirolles, FR


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à
l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
destinataire de ce message, merci de le détruire et d'en avertir
l'expéditeur.



--

Gérald Reinhart
Software Engineer



*E*gerald.reinh...@kelkoo.com
*A*Parc Sud Galaxie, 6, rue des Méridiens, 38130 Echirolles, FR


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


1:n relation and function queries

2017-02-01 Thread Ewald Moitzi
Hello,

I am unsure if solr is the right solution for a problem
that we have, of if it is better to stick with a relational
database (and if it should be done in solr how to implement it).
The explanation is a bit lengthy, but please
bear with me.

The problem:
Sort results of a vendor search for a product according to price
including delivery costs.

The data:
The store itself is a marketplace, and each product can be
supplied by different vendors. The vendors can define delivery
costs for different price ranges.
E.g:

 _price from_| price to_|  delivery cost |
| 0  |   49 |10  |
  vendor  --|50  |   99 | 5  |
|   100  |   max| 0  |

So, for product with a price of 55, I want the result to be 60.

Additional requirements:
 - The product price is also calculated, based on properties
   of the vendor.
 - There is also a pickup option, and there should be no
   duplicate results.
 - Different shipping costs for different countries.

Progress so far:
My idea is to store each range as a subdocument for a vendor, but
I don't know how to construct a query for that. So far I have
managed to implement a simpler version that gives the right result for
each country using dynamic fields, but this uses only a free delivery
above x approach and that is not what we want.

I have looked into the Block Join Query parser, but as far as I can
tell this does not allow to construct a function query with inputs
from parent and child documents.

Why solr:
 - sort and limit result according to geolocation.
 - we will deploy solr anyhow in this project, for a classic
   full text search.

As said above, I'm not really sure if this is a good application
for solr, but the geolocation features are quite handy. And the
query is not really fast in a relational db either.

Any input is greatly appreciated.

Regards,
Ewald



Re: [Benchmark SOLR] JETTY VS TOMCAT - Jetty 15% slower - need advice to improve Jetty performance

2017-02-01 Thread Rick Leir
There is a profiling tool in Eclipse that can show you a tree of method 
calls, with timing information. I have found this useful in the past to 
investigate a performance problem. But it might not help if the problem 
only occurs at 165 queries per second (is that true?).


cheers -- Rick


On 2017-01-30 04:02 AM, Gerald Reinhart wrote:


Hello,

In addition to the following settings, we have tried to :
- force Jetty to use more threads
- put the same GC options as our Tomcat
- change nb of acceptors and selectors

   and every time Jetty is slower than Tomcat.

   Any advice is welcome

Thanks,

Gérald Reinhart




On 01/27/2017 11:22 AM, Gerald Reinhart wrote:

Hello,

 We are migrating our platform
 from
  - Solr 5.4.1 hosted by a Tomcat
 to
  - Solr 5.4.1 standalone (hosted by Jetty)

=> Jetty is 15% slower than Tomcat in the same conditions.


Here are details about the benchmarks :

Context :
 - Index with 9 000 000 documents
 - Gatling launch queries extracted from the real traffic
 - Server :  R410 with 16 virtual CPU and 96G mem

Results with 20 clients in // during 10 minutes:
 For Tomcat :
 - 165 Queries per seconds
 - 120ms mean response time

 For Jetty :
 - 139 Queries per seconds
 - 142ms mean response time

We have checked :
- the load of the server => same
- the io wait => same
- the memory used in the JVM => same
- JVM GC settings => same

   For us, it's a blocker for the migration.

   Is it a known issue ? (I found that :
http://www.asjava.com/jetty/jetty-vs-tomcat-performance-comparison/)

   How can we improve the performance of Jetty ? (We have already
followed
http://www.eclipse.org/jetty/documentation/9.2.21.v20170120/optimizing.html 


recommendation)

  Many thanks,


Gérald Reinhart


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à 
l'attention exclusive de leurs destinataires. Si vous n'êtes pas le 
destinataire de ce message, merci de le détruire et d'en avertir 
l'expéditeur.



--

Gérald Reinhart
Software Engineer



*E*gerald.reinh...@kelkoo.com
*A*Parc Sud Galaxie, 6, rue des Méridiens, 38130 Echirolles, FR


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à 
l'attention exclusive de leurs destinataires. Si vous n'êtes pas le 
destinataire de ce message, merci de le détruire et d'en avertir 
l'expéditeur.




Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread alessandro.benedetti
The reason must to be found in the default list of components :

protected List getDefaultComponents()
  {
ArrayList names = new ArrayList<>(8);
names.add( QueryComponent.COMPONENT_NAME );
names.add( FacetComponent.COMPONENT_NAME );
*names.add( FacetModule.COMPONENT_NAME );*
names.add( MoreLikeThisComponent.COMPONENT_NAME );
...
return names;
  }

Based on the Solr doc[1] you were overriding the list of components,
forgetting the component required for Json faceting (facet module).
To have Json facets you don't need only the facet component but also the
facet module.

Furthermore what is your reason to override the default search components
involved in your request handler ?

Cheers

[1]
https://cwiki.apache.org/confluence/display/solr/RequestHandlers+and+SearchComponents+in+SolrConfig



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4318189.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade SOLR version - facets perfomance regression

2017-02-01 Thread SOLR4189
I noticed if I don't write list of components in request handler it works
fine, but if I add something like

  query
  facet
 
Facets don't work...
How can you explian it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-SOLR-version-facets-perfomance-regression-tp4315027p4318187.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to combine third party search data as top results ?

2017-02-01 Thread Charlie Hull

On 31/01/2017 19:04, Shamik Bandopadhyay wrote:

Hi,

  I'm trying to integrate results from a third party source with our
existing search. The idea is to include the top 5 results from this source
as the top result of our search.Though the external data is indexed in our
system, the use case dictates us to use their ranking (by getting the top
five result). Problem is, their result returns only text, title, and url.
To construct the final response, I need to include a bunch of metadata
fields which is only available in our index. Here are the steps:
1. Query external source, get top five results.
2. Query our index based on url from each result, retrieve their
corresponding id.
3. Query our index and pass the ids as elevateIds (dynamic query elevation)

This probably isn't a clean solution as it adds the overhead of an
additional query to retrieve document ids. Just wondering if there's a
better way to handle this situation, perhaps a way to combine step 2 and 3
in a single query or a different approach altogether?

Any pointers will be appreciated.

-Thanks,
Shamik


Hi Shamik,

I'm not sure if this will help, but we built a plugin for 'XJoin', 
allowing you to use results from an external system with Solr. Here are 
two blog posts about it:

http://www.flax.co.uk/blog/2016/01/25/xjoin-solr-part-1-filtering-using-price-discount-data/
http://www.flax.co.uk/blog/2016/01/29/xjoin-solr-part-2-click-example/

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk