Re: Solr 6 managed-schema & version control

2016-07-27 Thread John Bickerstaff
Erick - the UI you mention -- something that exists or something that has
to be built?  (I'm upgrading to version 6 as well and this question is one
I'll have to deal with...)

On Wed, Jul 27, 2016 at 5:31 PM, Rachid Bouacheria 
wrote:

> Thank you very much Erick, I appreciate your feed back.
>
> On Wed, Jul 27, 2016 at 2:24 PM, Erick Erickson 
> wrote:
>
> > Using classic schema is perfectly acceptable/reasonable, you can
> > continue to do so freely (you'll have to change to
> > ClassicSchemaFactory though).
> >
> > Also, you can freely edit managed-schema just as you did schema.xml.
> > The "trick" here is that you have to take some care _not_ to issue
> > commands that modify the schema or the in-memory version will
> > overwrite the one in ZK. Otherwise, though, you can freely use
> > managed-schema just as you do classic schema.
> >
> > So you can do just what you do now, keep managed-schema in VCS and
> > upconfig it. Also note that Solr 6.2 has "bin/solr zk
> > upconfig/downconfig/cp/mv/ls" functionality.
> >
> > Managed lends itself to some kind of UI that maintains it. The process
> > (IMO) for using that in prod would be something like:
> > > Use the UI to build your schema
> > > copy from ZK to your local machine
> > > put the configs in VCS
> > > Deploy using the VCS as your system-of-record.
> >
> > But that's just my approach. If you don't want to use the
> > managed-schema features, switch back to classic IMO.
> >
> > Best,
> > Erick
> >
> >
> >
> >
> >
> > On Wed, Jul 27, 2016 at 11:37 AM, Rachid Bouacheria 
> > wrote:
> > > Hi All,
> > >
> > > I am upgrading from solr 4 to 6.
> > > In solr 4 I have a schema.xml that is under version control.
> > > But solr 6 has the notion of a managed schema that could be modified
> via
> > a
> > > solr api call.
> > > This seems great and flexible, but my assumption is that in this case
> > > zookeeper becomes the authoritative copy and not SVN or Git.
> > >
> > > And this is where things become unclear to me.
> > > Is the expectation to download the configuration from zk the same way
> we
> > do
> > > an svn checkout to have the configuration and run locally?
> > > How do we know who changed what and when?
> > >
> > > I know that there still is the option to use schema.xml by using
> > > the ClassicIndexSchemaFactory but I am curious to hear from y'all that
> > use
> > > managed schema how you are doing it and if there are any downside,
> > gotchas,
> > > or if all is just much better :-)
> > >
> > > Seems to me that running locally is harder as you cannot just checkout
> a
> > > project that contains the up to date schema.
> > >
> > > Thank you,
> > > Rachid.
> >
>


Re: Uploading all files under directory with no extension

2016-07-27 Thread Alexandre Rafalovitch
I believe the extensions are used for type guessing normally.

You could try explicitly specify files if they are in one directory
and you only have non-extension files. Or you could do a find|grep
-v|xargs -n command sequence to find whatever you need and feed it to
the post script.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 28 July 2016 at 02:12, Nirav Patel  wrote:
> I tried using post tool with following parameters. Looks like its not
> uploading files if it doesn't have known extension.
>
>
> ./bin/post -c mycol1  -params "separator=%09" -type text/tsv -filetypes tsv
>  /dev/datascience/pod1/population/baseline/
>
> /usr/java/jdk1.8.0_102//bin/java -classpath
> /home/xactly/solr-6.1.0/dist/solr-core-6.1.0.jar -Dauto=yes -Dc=bonusOrder
> -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool
> /mapr/insights/datascience/rulec/prdx/bonusOrderType/baseline/
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:8983/solr/mycol1/update...
> Entering auto mode. File endings considered are
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> Entering recursive mode, max depth=999, delay=0s
> Indexing directory /dev/datascience/pod1/population/baseline/ (0 files,
> depth=0)
> 0 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/mycol1/update...
> Time spent: 0:00:00.056
>
>
> All the files I have under  /dev/datascience/pod1/population/baseline/
> having same structure and they all are tsvs. Only thing is they don't have
> any extension.
>
> Is there a way to upload them using cli.
>
> Thanks
>
> --
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 


Re: Solr 6 managed-schema & version control

2016-07-27 Thread Rachid Bouacheria
Thank you very much Erick, I appreciate your feed back.

On Wed, Jul 27, 2016 at 2:24 PM, Erick Erickson 
wrote:

> Using classic schema is perfectly acceptable/reasonable, you can
> continue to do so freely (you'll have to change to
> ClassicSchemaFactory though).
>
> Also, you can freely edit managed-schema just as you did schema.xml.
> The "trick" here is that you have to take some care _not_ to issue
> commands that modify the schema or the in-memory version will
> overwrite the one in ZK. Otherwise, though, you can freely use
> managed-schema just as you do classic schema.
>
> So you can do just what you do now, keep managed-schema in VCS and
> upconfig it. Also note that Solr 6.2 has "bin/solr zk
> upconfig/downconfig/cp/mv/ls" functionality.
>
> Managed lends itself to some kind of UI that maintains it. The process
> (IMO) for using that in prod would be something like:
> > Use the UI to build your schema
> > copy from ZK to your local machine
> > put the configs in VCS
> > Deploy using the VCS as your system-of-record.
>
> But that's just my approach. If you don't want to use the
> managed-schema features, switch back to classic IMO.
>
> Best,
> Erick
>
>
>
>
>
> On Wed, Jul 27, 2016 at 11:37 AM, Rachid Bouacheria 
> wrote:
> > Hi All,
> >
> > I am upgrading from solr 4 to 6.
> > In solr 4 I have a schema.xml that is under version control.
> > But solr 6 has the notion of a managed schema that could be modified via
> a
> > solr api call.
> > This seems great and flexible, but my assumption is that in this case
> > zookeeper becomes the authoritative copy and not SVN or Git.
> >
> > And this is where things become unclear to me.
> > Is the expectation to download the configuration from zk the same way we
> do
> > an svn checkout to have the configuration and run locally?
> > How do we know who changed what and when?
> >
> > I know that there still is the option to use schema.xml by using
> > the ClassicIndexSchemaFactory but I am curious to hear from y'all that
> use
> > managed schema how you are doing it and if there are any downside,
> gotchas,
> > or if all is just much better :-)
> >
> > Seems to me that running locally is harder as you cannot just checkout a
> > project that contains the up to date schema.
> >
> > Thank you,
> > Rachid.
>


Re: AnalyticsQuery fails on a sharded collection

2016-07-27 Thread Joel Bernstein
The finish() method operates on the search node, not the aggregator node.
So whether it's distributed shouldn't effect how it runs. If you can post
your code I might be able to see the issue.

As far using a MergeStrategy, I would suggest creating a streaming
expression that handles the merge. This is a much cleaner approach. An
example of how this works can be seen in this patch:

https://issues.apache.org/jira/secure/attachment/12820171/SOLR-9252.patch

The AnalyticsQuery in this case is:

TextLogisticRegressionQParserPlugin.java

The expression is:

TextLogitStream.java

The TextLogitStream has sample code for calling the shards and merging
the results.

If you want to use this approach the following patch is needed so you
can add your own streaming expression:

https://issues.apache.org/jira/browse/SOLR-9103

This will likely be in 6.2











Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jul 27, 2016 at 5:36 PM, tedsolr  wrote:

> I'm looking to create a merge strategy for a custom QParserPlugin I have.
> The
> plugin works fine on collections with one shard. I was very surprised to
> see
> it throw an exception when I ran it against a sharded collection. So my
> question is a bit of a shot in the dark. I'll first note that the
> CollapsingQParserPlugin included with Solr works as expected on my test
> collection with two shards.
>
> The NPE occurs in my DelegatingCollector's finish() method as it's setting
> the next doc base. It appears I have a null LeafReaderContext. Without
> knowing anything about my code, what is it about multiple shards that might
> throw off a collector like this?
>
> thanks!
> v5.2.1
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Is there a way to filter the results based on weight - SOLR suggester?

2016-07-27 Thread bbarani
Hi,

I am using suggester component in SOLR 5.5.1 and sort the matching
suggestion based on a custom field (lookupCount) field. The below
configuration seems to work fine but its returning the matching term even if
the weight is set to 0. Is there a way to restrict returning the matching
term based on weight field? 

Something like. Return the matching suggestions that doesn't have the value
of the weight field set to 0?

  
mySuggester
FuzzyLookupFactory 
DocumentDictionaryFactory 
userQuery
lookupCount
string
  


*Current sample output (returns the matching term even it has weight set to
0)
*

/solr/test/suggest?suggest=true=mySuggester=xml=bill=5

5


bill
0



billing history
69753







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-filter-the-results-based-on-weight-SOLR-suggester-tp4289286.html
Sent from the Solr - User mailing list archive at Nabble.com.


AnalyticsQuery fails on a sharded collection

2016-07-27 Thread tedsolr
I'm looking to create a merge strategy for a custom QParserPlugin I have. The
plugin works fine on collections with one shard. I was very surprised to see
it throw an exception when I ran it against a sharded collection. So my
question is a bit of a shot in the dark. I'll first note that the
CollapsingQParserPlugin included with Solr works as expected on my test
collection with two shards.

The NPE occurs in my DelegatingCollector's finish() method as it's setting
the next doc base. It appears I have a null LeafReaderContext. Without
knowing anything about my code, what is it about multiple shards that might
throw off a collector like this?

thanks!
v5.2.1 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 6 managed-schema & version control

2016-07-27 Thread Erick Erickson
Using classic schema is perfectly acceptable/reasonable, you can
continue to do so freely (you'll have to change to
ClassicSchemaFactory though).

Also, you can freely edit managed-schema just as you did schema.xml.
The "trick" here is that you have to take some care _not_ to issue
commands that modify the schema or the in-memory version will
overwrite the one in ZK. Otherwise, though, you can freely use
managed-schema just as you do classic schema.

So you can do just what you do now, keep managed-schema in VCS and
upconfig it. Also note that Solr 6.2 has "bin/solr zk
upconfig/downconfig/cp/mv/ls" functionality.

Managed lends itself to some kind of UI that maintains it. The process
(IMO) for using that in prod would be something like:
> Use the UI to build your schema
> copy from ZK to your local machine
> put the configs in VCS
> Deploy using the VCS as your system-of-record.

But that's just my approach. If you don't want to use the
managed-schema features, switch back to classic IMO.

Best,
Erick





On Wed, Jul 27, 2016 at 11:37 AM, Rachid Bouacheria  wrote:
> Hi All,
>
> I am upgrading from solr 4 to 6.
> In solr 4 I have a schema.xml that is under version control.
> But solr 6 has the notion of a managed schema that could be modified via a
> solr api call.
> This seems great and flexible, but my assumption is that in this case
> zookeeper becomes the authoritative copy and not SVN or Git.
>
> And this is where things become unclear to me.
> Is the expectation to download the configuration from zk the same way we do
> an svn checkout to have the configuration and run locally?
> How do we know who changed what and when?
>
> I know that there still is the option to use schema.xml by using
> the ClassicIndexSchemaFactory but I am curious to hear from y'all that use
> managed schema how you are doing it and if there are any downside, gotchas,
> or if all is just much better :-)
>
> Seems to me that running locally is harder as you cannot just checkout a
> project that contains the up to date schema.
>
> Thank you,
> Rachid.


Solr 6 managed-schema & version control

2016-07-27 Thread Rachid Bouacheria
Hi All,

I am upgrading from solr 4 to 6.
In solr 4 I have a schema.xml that is under version control.
But solr 6 has the notion of a managed schema that could be modified via a
solr api call.
This seems great and flexible, but my assumption is that in this case
zookeeper becomes the authoritative copy and not SVN or Git.

And this is where things become unclear to me.
Is the expectation to download the configuration from zk the same way we do
an svn checkout to have the configuration and run locally?
How do we know who changed what and when?

I know that there still is the option to use schema.xml by using
the ClassicIndexSchemaFactory but I am curious to hear from y'all that use
managed schema how you are doing it and if there are any downside, gotchas,
or if all is just much better :-)

Seems to me that running locally is harder as you cannot just checkout a
project that contains the up to date schema.

Thank you,
Rachid.


Re: SolrCloud create_collection not uploading configs to zookeeper

2016-07-27 Thread Nirav Patel
Got it. That make sense.

Thanks

On Wed, Jul 27, 2016 at 9:08 AM, Shawn Heisey  wrote:

> On 7/26/2016 5:30 PM, Nirav Patel wrote:
> > OK, I can see '/configs' directory in Solr UI and under that I can see
> > configuration fo my 'test' collection. BUt this all seemed to be
> > disjointed information. Doc is definitely not clear. And what that
> > Tree represent anyway where;s information for that. I would ideally
> > put link to that document.
>
> You won't find a literal directory named "/configs" anywhere.
>
> This is a path within the zookeeper *database*.  Zookeeper presents the
> database as something that looks like a filesystem, so it's easier for
> programmers to conceptualize when they are writing programs that use
> zookeeper ... but it is still a database, not a literal filesystem.
>
> Thanks,
> Shawn
>
>

-- 


[image: What's New with Xactly] 

  [image: LinkedIn] 
  [image: Twitter] 
  [image: Facebook] 
  [image: YouTube] 



SolrCloud: Failure to recover on restart following OutOfMemoryError

2016-07-27 Thread Kelly, Frank
Hi All,

 We have a SolrCloud cluster with 3 Virtual Machines, assigning 4GB to the Java 
Heap.
Recently we added a number of collections to the machine going from around 80 
collections (each with 3 shards x 3 replicas) to 150 collections

We've hit Heap errors.
That wasn't the surprise, the surprise was that when I restarted - allowing 
more Xmx heap for Java (now 6GB) that Solr did not and could not recover 
despite having enough memory.
It was complaining about ZooKeeper status - "ZooKeeper thinks I am the leader 
but I am not".
I can only successfully recover by shutting off Solr, and nuking the ZooKeeper 
configs, recreating the configs and restarting Solr and deleting all 
collections.

Shouldn't Solr be able to recover on restart or does OutOfMemoryError cause 
some kind of Zk/Solr cluster state corruption that is unrecoverable?

-Frank
[cid:9D23A7EE-B937-4A83-9A5B-38A778230C49]
Frank Kelly
Principal Software Engineer
Predictive Analytics Team (SCBE/HAC/CDA)






HERE
5 Wayside Rd, Burlington, MA 01803, USA
42° 29' 7" N 71° 11' 32" W

[cid:C257596C-835F-46C4-9C3A-B6FEA434522E]  
[cid:B15C92A6-468E-42FB-BE2C-F3211A891159]    
[cid:A6EA73A4-F4B9-4CB9-ACA4-6A6CF864B1FD] 
[cid:5FE83A6F-759B-4976-8575-6AE8D5F99222] 

[cid:45DB5498-9C1F-4EB9-B1C8-94A508F13BF4] 







Uploading all files under directory with no extension

2016-07-27 Thread Nirav Patel
I tried using post tool with following parameters. Looks like its not
uploading files if it doesn't have known extension.


./bin/post -c mycol1  -params "separator=%09" -type text/tsv -filetypes tsv
 /dev/datascience/pod1/population/baseline/

/usr/java/jdk1.8.0_102//bin/java -classpath
/home/xactly/solr-6.1.0/dist/solr-core-6.1.0.jar -Dauto=yes -Dc=bonusOrder
-Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool
/mapr/insights/datascience/rulec/prdx/bonusOrderType/baseline/
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/mycol1/update...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory /dev/datascience/pod1/population/baseline/ (0 files,
depth=0)
0 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/mycol1/update...
Time spent: 0:00:00.056


All the files I have under  /dev/datascience/pod1/population/baseline/
having same structure and they all are tsvs. Only thing is they don't have
any extension.

Is there a way to upload them using cli.

Thanks

-- 


[image: What's New with Xactly] 

  [image: LinkedIn] 
  [image: Twitter] 
  [image: Facebook] 
  [image: YouTube] 



Re: SolrCloud create_collection not uploading configs to zookeeper

2016-07-27 Thread Shawn Heisey
On 7/26/2016 5:30 PM, Nirav Patel wrote:
> OK, I can see '/configs' directory in Solr UI and under that I can see
> configuration fo my 'test' collection. BUt this all seemed to be
> disjointed information. Doc is definitely not clear. And what that
> Tree represent anyway where;s information for that. I would ideally
> put link to that document.

You won't find a literal directory named "/configs" anywhere.

This is a path within the zookeeper *database*.  Zookeeper presents the
database as something that looks like a filesystem, so it's easier for
programmers to conceptualize when they are writing programs that use
zookeeper ... but it is still a database, not a literal filesystem.

Thanks,
Shawn



Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Bernd Fehling
If there is a problem in single index then it might also be in CloudSolr.
As far as I could figure out from INFOSTREAM, documents are added to segments
and terms are "collected". Duplicate term are "deleted" (or whatever).
These deletes (or whatever) are not concurrent.
I have a lines like:
BD 0 [Wed Jul 27 13:28:48 GMT+01:00 2016; Thread-27879]: applyDeletes: infos=...
BD 0 [Wed Jul 27 13:31:48 GMT+01:00 2016; Thread-27879]: applyDeletes took 
180028 msec
...
BD 0 [Wed Jul 27 13:42:03 GMT+01:00 2016; Thread-27890]: applyDeletes: infos=...
BD 0 [Wed Jul 27 14:38:55 GMT+01:00 2016; Thread-27890]: applyDeletes took 
3411845 msec

3411545 msec are about 56 minutes where the system is doing what???
At least not indexing because only one JAVA process and no I/O at all!

How can SolrJ help me now with this problem?

Best
Bernd


Am 27.07.2016 um 16:41 schrieb Erick Erickson:
> Well, at least it'll be easier to debug in my experience. Simple example.
> At some point you'll call CloudSolrClient.add(doc list). Comment just that
> out and you'll be able to isolate whether the issue is querying the be or
> sending to Solr.
> 
> Then CloudSolrClient (assuming SolrCloud) has efficiencies in terms of
> routing...
> 
> Best
> Erick
> 
> On Jul 27, 2016 7:24 AM, "Bernd Fehling" 
> wrote:
> 
>> So writing some SolrJ doing the same job as the DIH script
>> and using that concurrent will solve my problem?
>> I'm not using Tika.
>>
>> I don't think that DIH is my problem, even if it is not the best solution
>> right now.
>> Nevertheless, you are right SolrJ has higher performance, but what
>> if I have the same problems with SolrJ like with DIH?
>>
>> If it runs with DIH it should run with SolrJ with additional performance
>> boost.
>>
>> Bernd
>>
>>
>> On 27.07.2016 at 16:03, Erick Erickson:
>>> I'd actually recommend you move to a SolrJ solution
>>> or similar. Currently, you're putting a load on the Solr
>>> servers (especially if you're also using Tika) in addition
>>> to all indexing etc.
>>>
>>> Here's a sample:
>>> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
>>>
>>> Dodging the question I know, but DIH sometimes isn't
>>> the best solution.
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Jul 27, 2016 at 6:59 AM, Bernd Fehling
>>>  wrote:
 After enhancing the server with SSDs I'm trying to speed up indexing.

 The server has 16 CPUs and more than 100G RAM.
 JAVA (1.8.0_92) has 24G.
 SOLR is 4.10.4.
 Plain XML data to load is 218G with about 96M records.
 This will result in a single index of 299G.

 I tried with 4, 8, 12 and 16 concurrent DIHs.
 16 and 12 was to much because for 16 CPUs and my test continued with 8
>> concurrent DIHs.
 Then i was trying different  and  settings
>> but now I'm stuck.
 I can't figure out what is the best setting for bulk indexing.
 What I see is that the indexing is "falling asleep" after some time of
>> indexing.
 It is only producing del-files, like _11_1.del, _w_2.del, _h_3.del,...

 
 8
 1024
 -1
 
   8
   100
   512
 
 8
 > class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
 ${solr.lock.type:native}
 ...
 

 
  ### no autocommit at all
  
${solr.autoSoftCommit.maxTime:-1}
  
 



>> command=full-import=false=false=false=false
 After indexing finishes there is a final optimize.

 My idea is, if 8 DIHs use 8 CPUs then I have 8 CPUs left for merging
 (maxIndexingThreads/maxMergeAtOnce/mergeFactor).
 It should do no commit, no optimize.
 ramBufferSizeMB is high because I have plenty of RAM and I want make
>> use the speed of RAM.
 segmentsPerTier is high to reduce merging.

 But somewhere is a misconfiguration because indexing gets stalled.

 Any idea what's going wrong?


 Bernd

>>
> 

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Solr: Block Join Faceting

2016-07-27 Thread Morse, Matthew K.
Sorry for the double post.


Solr: Block Join Faceting

2016-07-27 Thread Morse, Matthew K.
I have a document with multiple different children.  Below is the structure:

1
true

  11
  false
  22
  6


  12
  false
  11
7276


  13
  false
  100419


  14
  false
  245

  
  
2
true

  15
  false
  92834
  6


  16
  false
  89
345


  17
  false
  64651


  19
  false
  888

  

I am trying to write a query which will return child facets but I am either 
receiving an error or no results.  I'm wondering if someone could point me in 
the right direction.
I am working with Solr 6.1.0.

Here is my schema:


  
  
  
  
  
  


1.With this query I receive no facet results.  The main response is 
correct, but I was expecting to see a fundCode facet of 245 with 1.  Below is 
the query with the output.


... 
/Clients/bjqfacet?fq={!parent%20which=isParent:true}fundCode:245={!parent%20which=isParent:true}predictiveModelId:22={!parent%20which=isParent:true}predictiveModelRank:7276=on={!parent%20which=isParent:true}employeeId:100419=json=fundCode=true



{

  "responseHeader":{

"status":0,

"QTime":2},

  "response":{"numFound":1,"start":0,"docs":[

  {

"id":"7258",

"_version_":1540942803838697474}]

  },

  "facet_counts":[

"facet_fields",[

  "fundCode",[]]],

  "debug":{

"rawquerystring":"{!parent which=isParent:true}employeeId:100419",

"querystring":"{!parent which=isParent:true}employeeId:100419",

"parsedquery":"AllParentsAware(ToParentBlockJoinQuery 
(employeeId:`\u000b\\k\u0017#))",

"parsedquery_toString":"ToParentBlockJoinQuery 
(employeeId:`\u000b\\k\u0017#)",

"explain":{

  "7258":"\n0.0 = Score based on 1 child docs in range from 822434 to 
822469, best match:\n  4.0979195 = weight(employeeId:`\u000b\\k\u0017# in 
822441) [], result of:\n4.0979195 = score(doc=822441,freq=1.0 = 
termFreq=1.0\n), product of:\n  4.0979195 = idf(docFreq=269330, 
docCount=16217701)\n  1.0 = tfNorm, computed from:\n1.0 = 
termFreq=1.0\n1.2 = parameter k1\n0.0 = parameter b (norms 
omitted for field)\n"},

"QParser":"BlockJoinParentQParser",

"filter_queries":["{!parent which=isParent:true}fundCode:245",

  "{!parent which=isParent:true}predictiveModelId:22",

  "{!parent which=isParent:true}predictiveModelRank:7276"],

"parsed_filter_queries":["AllParentsAware(ToParentBlockJoinQuery 
(fundCode:245))",

  "AllParentsAware(ToParentBlockJoinQuery 
(predictiveModelId:`\b\u\u\u\u0016))",

  "AllParentsAware(ToParentBlockJoinQuery 
(predictiveModelRank:`\b\u\u8l))",

  "BlockJoinFacetFilter(null)"],

"timing":{

  "time":2.0,

  "prepare":{

"time":0.0,

"query":{

  "time":0.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":0.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"bjqFacetComponent":{

  "time":0.0},

"debug":{

  "time":0.0}},

  "process":{

"time":1.0,

"query":{

  "time":0.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":0.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"bjqFacetComponent":{

  "time":0.0},

"debug":{

  "time":0.0}



2.   If I change the child facet another field, like an int, it blows up.  
Again I was expecting to see two results for predictiveModelId facet: 11 with 1 
and 22 with 1.


.../Clients/bjqfacet?fq={!parent%20which=isParent:true}fundCode:245={!parent%20which=isParent:true}predictiveModelId:22={!parent%20which=isParent:true}predictiveModelRank:7276=on={!parent%20which=isParent:true}employeeId:100419=json=predictiveModelId=true



{

  "responseHeader":{

"status":500,

"QTime":1},

  "error":{

"msg":"unexpected docvalues type NUMERIC for field 'predictiveModelId' 
(expected one of [SORTED, SORTED_SET]). Use UninvertingReader or index with 
docvalues.",

"trace":"java.lang.IllegalStateException: unexpected docvalues type NUMERIC 
for field 'predictiveModelId' (expected one of [SORTED, SORTED_SET]). Use 
UninvertingReader or index with docvalues.\n\tat 
org.apache.lucene.index.DocValues.checkField(DocValues.java:212)\n\tat 
org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:306)\n\tat 
org.apache.solr.search.join.BlockJoinFieldFacetAccumulator.initSegmentData(BlockJoinFieldFacetAccumulator.java:77)\n\tat
 

Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Erick Erickson
Well, at least it'll be easier to debug in my experience. Simple example.
At some point you'll call CloudSolrClient.add(doc list). Comment just that
out and you'll be able to isolate whether the issue is querying the be or
sending to Solr.

Then CloudSolrClient (assuming SolrCloud) has efficiencies in terms of
routing...

Best
Erick

On Jul 27, 2016 7:24 AM, "Bernd Fehling" 
wrote:

> So writing some SolrJ doing the same job as the DIH script
> and using that concurrent will solve my problem?
> I'm not using Tika.
>
> I don't think that DIH is my problem, even if it is not the best solution
> right now.
> Nevertheless, you are right SolrJ has higher performance, but what
> if I have the same problems with SolrJ like with DIH?
>
> If it runs with DIH it should run with SolrJ with additional performance
> boost.
>
> Bernd
>
>
> On 27.07.2016 at 16:03, Erick Erickson:
> > I'd actually recommend you move to a SolrJ solution
> > or similar. Currently, you're putting a load on the Solr
> > servers (especially if you're also using Tika) in addition
> > to all indexing etc.
> >
> > Here's a sample:
> > https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
> >
> > Dodging the question I know, but DIH sometimes isn't
> > the best solution.
> >
> > Best,
> > Erick
> >
> > On Wed, Jul 27, 2016 at 6:59 AM, Bernd Fehling
> >  wrote:
> >> After enhancing the server with SSDs I'm trying to speed up indexing.
> >>
> >> The server has 16 CPUs and more than 100G RAM.
> >> JAVA (1.8.0_92) has 24G.
> >> SOLR is 4.10.4.
> >> Plain XML data to load is 218G with about 96M records.
> >> This will result in a single index of 299G.
> >>
> >> I tried with 4, 8, 12 and 16 concurrent DIHs.
> >> 16 and 12 was to much because for 16 CPUs and my test continued with 8
> concurrent DIHs.
> >> Then i was trying different  and  settings
> but now I'm stuck.
> >> I can't figure out what is the best setting for bulk indexing.
> >> What I see is that the indexing is "falling asleep" after some time of
> indexing.
> >> It is only producing del-files, like _11_1.del, _w_2.del, _h_3.del,...
> >>
> >> 
> >> 8
> >> 1024
> >> -1
> >> 
> >>   8
> >>   100
> >>   512
> >> 
> >> 8
> >>  class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
> >> ${solr.lock.type:native}
> >> ...
> >> 
> >>
> >> 
> >>  ### no autocommit at all
> >>  
> >>${solr.autoSoftCommit.maxTime:-1}
> >>  
> >> 
> >>
> >>
> >>
> command=full-import=false=false=false=false
> >> After indexing finishes there is a final optimize.
> >>
> >> My idea is, if 8 DIHs use 8 CPUs then I have 8 CPUs left for merging
> >> (maxIndexingThreads/maxMergeAtOnce/mergeFactor).
> >> It should do no commit, no optimize.
> >> ramBufferSizeMB is high because I have plenty of RAM and I want make
> use the speed of RAM.
> >> segmentsPerTier is high to reduce merging.
> >>
> >> But somewhere is a misconfiguration because indexing gets stalled.
> >>
> >> Any idea what's going wrong?
> >>
> >>
> >> Bernd
> >>
>


Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Bernd Fehling
So writing some SolrJ doing the same job as the DIH script
and using that concurrent will solve my problem?
I'm not using Tika.

I don't think that DIH is my problem, even if it is not the best solution right 
now.
Nevertheless, you are right SolrJ has higher performance, but what
if I have the same problems with SolrJ like with DIH?

If it runs with DIH it should run with SolrJ with additional performance boost.

Bernd


On 27.07.2016 at 16:03, Erick Erickson:
> I'd actually recommend you move to a SolrJ solution
> or similar. Currently, you're putting a load on the Solr
> servers (especially if you're also using Tika) in addition
> to all indexing etc.
> 
> Here's a sample:
> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
> 
> Dodging the question I know, but DIH sometimes isn't
> the best solution.
> 
> Best,
> Erick
> 
> On Wed, Jul 27, 2016 at 6:59 AM, Bernd Fehling
>  wrote:
>> After enhancing the server with SSDs I'm trying to speed up indexing.
>>
>> The server has 16 CPUs and more than 100G RAM.
>> JAVA (1.8.0_92) has 24G.
>> SOLR is 4.10.4.
>> Plain XML data to load is 218G with about 96M records.
>> This will result in a single index of 299G.
>>
>> I tried with 4, 8, 12 and 16 concurrent DIHs.
>> 16 and 12 was to much because for 16 CPUs and my test continued with 8 
>> concurrent DIHs.
>> Then i was trying different  and  settings but 
>> now I'm stuck.
>> I can't figure out what is the best setting for bulk indexing.
>> What I see is that the indexing is "falling asleep" after some time of 
>> indexing.
>> It is only producing del-files, like _11_1.del, _w_2.del, _h_3.del,...
>>
>> 
>> 8
>> 1024
>> -1
>> 
>>   8
>>   100
>>   512
>> 
>> 8
>> > class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
>> ${solr.lock.type:native}
>> ...
>> 
>>
>> 
>>  ### no autocommit at all
>>  
>>${solr.autoSoftCommit.maxTime:-1}
>>  
>> 
>>
>>
>> command=full-import=false=false=false=false
>> After indexing finishes there is a final optimize.
>>
>> My idea is, if 8 DIHs use 8 CPUs then I have 8 CPUs left for merging
>> (maxIndexingThreads/maxMergeAtOnce/mergeFactor).
>> It should do no commit, no optimize.
>> ramBufferSizeMB is high because I have plenty of RAM and I want make use the 
>> speed of RAM.
>> segmentsPerTier is high to reduce merging.
>>
>> But somewhere is a misconfiguration because indexing gets stalled.
>>
>> Any idea what's going wrong?
>>
>>
>> Bernd
>>


Re: Is it possible to force a Shard Leader change?

2016-07-27 Thread Erick Erickson
The REBALANCELEADERS stuff was put in to deal with 100s of leaders
winding up on a single machine in a case where extremely high
throughput was required. Until you get into pretty high scale the
additional "work" on a leader is minimal. So unless your CPU usage is
consistently significantly higher on the machine with all the leaders,
I wouldn't worry about it.

Otherwise there isn't much you can do I'm afraid. If you have
asymmetric replica placement leaders will tend to different machines.
You could try to take the JVM down on the machine with all the leaders
and let leader election redistribute, but I that's not a long-term
solution.

Best,
Erick

On Tue, Jul 26, 2016 at 9:27 PM, Tim Chen  wrote:
> Hi Guys,
>
> I am running a Solr Cloud 4.10, with 4 Solr servers and 5 Zookeeper setup.
>
> Solr servers:
> solr01, solr02, solr03, solr04
>
> I have around 20 collections in Solr cloud, and there are 4 Shards for each 
> Collection. For each Shard, I have 4 Replicas, and sitting on each Solr 
> server, with one of them is the Shard Leader.
>
> The issue I am having right now is all the Shard Leader are pointing to the 
> same server, eg: solr01.  When there are documents update, they are all 
> pushed to the Leader. I really want to distribute the Shard Leader across all 
> 4 Solr servers.
>
> I noticed Solr 6 has a "REBALANCELEADERS" command to do that, but not 
> available in Solr 4.
>
> Questions:
>
> 1, Is my setup OK? with 4 Shards for each Collection and 4 Replicas for each 
> Shard. Each Solr server has full set of documents.
> 2, To distribute the Shard Leader to different Solr servers, can I somehow 
> shutdown a single Replica that is currently a Shard Leader and force Solr to 
> elect a different replica to be new Shard Leader?
>
> Thanks guys!
>
> Regards,
> Tim
>
>
> [Roots Wednesday 27 July 8.30pm]


Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Erick Erickson
I'd actually recommend you move to a SolrJ solution
or similar. Currently, you're putting a load on the Solr
servers (especially if you're also using Tika) in addition
to all indexing etc.

Here's a sample:
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Dodging the question I know, but DIH sometimes isn't
the best solution.

Best,
Erick

On Wed, Jul 27, 2016 at 6:59 AM, Bernd Fehling
 wrote:
> After enhancing the server with SSDs I'm trying to speed up indexing.
>
> The server has 16 CPUs and more than 100G RAM.
> JAVA (1.8.0_92) has 24G.
> SOLR is 4.10.4.
> Plain XML data to load is 218G with about 96M records.
> This will result in a single index of 299G.
>
> I tried with 4, 8, 12 and 16 concurrent DIHs.
> 16 and 12 was to much because for 16 CPUs and my test continued with 8 
> concurrent DIHs.
> Then i was trying different  and  settings but 
> now I'm stuck.
> I can't figure out what is the best setting for bulk indexing.
> What I see is that the indexing is "falling asleep" after some time of 
> indexing.
> It is only producing del-files, like _11_1.del, _w_2.del, _h_3.del,...
>
> 
> 8
> 1024
> -1
> 
>   8
>   100
>   512
> 
> 8
> 
> ${solr.lock.type:native}
> ...
> 
>
> 
>  ### no autocommit at all
>  
>${solr.autoSoftCommit.maxTime:-1}
>  
> 
>
>
> command=full-import=false=false=false=false
> After indexing finishes there is a final optimize.
>
> My idea is, if 8 DIHs use 8 CPUs then I have 8 CPUs left for merging
> (maxIndexingThreads/maxMergeAtOnce/mergeFactor).
> It should do no commit, no optimize.
> ramBufferSizeMB is high because I have plenty of RAM and I want make use the 
> speed of RAM.
> segmentsPerTier is high to reduce merging.
>
> But somewhere is a misconfiguration because indexing gets stalled.
>
> Any idea what's going wrong?
>
>
> Bernd
>
>
>
>


Re: How to configure solr while having Apostrophes in fields

2016-07-27 Thread Erick Erickson
I'd _strongly_ recommend you become familiar with the
admin>>(your core)>>analysis page. It tells you exactly
what each filter does to your input and makes it much
simpler to answer questions like this. Hover over each
of the gray letter pairs (e.g. "SF" will be gray, hover over
it and you'll see that that's the "StopFilter").

In this case WordDelimiterFilterFactory is breaking
on on all non alpha-numerics. Do note that when
you remove it, all the _other_ punctuation that it
strips will suddenly be relevant, i.e.
"my dog has fleas.", the period after "fleas" will
be part of that token so you'll have to deal with that.

Best,
Erick

On Wed, Jul 27, 2016 at 4:58 AM, nitin.garg88  wrote:
> When i search for "plato" it return me all records with
> "plato,platos,plato's"
> When i search for "platos" it return me all records with "platos,plato's"
> When i search for "plato's" it return me all records with "platos,plato's"
>
> Please suggest me how to configure schema.xml .Below is my "text" setting in
> schema.xml
>
>autoGeneratePhraseQueries="true">
>   
> 
>
>
> 
>
>
>  ignoreCase="true"
>
> synonyms="${home}/solr-configuration/bibliographic-protected-synonyms.txt"/>
>
>
>  generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="0"
> stemEnglishPossessive="0" splitOnCaseChange="0"
>
> protected="${home}/solr-configuration/bibliographic-protwords.txt"/>
>
>
>  enablePositionIncrements="true"
>
> words="${home}/solr-configuration/bibliographic-stopwords.txt"/>
>
>
>  ignoreCase="true"
>
> synonyms="${home}/solr-configuration/bibliographic-synonyms.txt"/>
>
>
>  mappingFile="${home}/sort.map"/>
>   
>   
> 
> 
>  ignoreCase="true"
>
> synonyms="${home}/solr-configuration/bibliographic-protected-synonyms.txt"/>
>  generateWordParts="1" generateNumberParts="1"
> catenateWords="0" catenateNumbers="0" catenateAll="0"
> stemEnglishPossessive="0" splitOnCaseChange="0"
>
> protected="${home}/solr-configuration/bibliographic-protwords.txt" />
>  enablePositionIncrements="true"
>
> words="${home}/solr-configuration/bibliographic-stopwords.txt"/>
>  ignoreCase="true"
>
> synonyms="${home}/solr-configuration/bibliographic-synonyms.txt"/>
>  mappingFile="${home}/sort.map"/>
>   
> 
>
> Thanks in advance !
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-configure-solr-while-having-Apostrophes-in-fields-tp4289196.html
> Sent from the Solr - User mailing list archive at Nabble.com.


problems with bulk indexing with concurrent DIH

2016-07-27 Thread Bernd Fehling
After enhancing the server with SSDs I'm trying to speed up indexing.

The server has 16 CPUs and more than 100G RAM.
JAVA (1.8.0_92) has 24G.
SOLR is 4.10.4.
Plain XML data to load is 218G with about 96M records.
This will result in a single index of 299G.

I tried with 4, 8, 12 and 16 concurrent DIHs.
16 and 12 was to much because for 16 CPUs and my test continued with 8 
concurrent DIHs.
Then i was trying different  and  settings but now 
I'm stuck.
I can't figure out what is the best setting for bulk indexing.
What I see is that the indexing is "falling asleep" after some time of indexing.
It is only producing del-files, like _11_1.del, _w_2.del, _h_3.del,...


8
1024
-1

  8
  100
  512

8

${solr.lock.type:native}
...



 ### no autocommit at all
 
   ${solr.autoSoftCommit.maxTime:-1}
 



command=full-import=false=false=false=false
After indexing finishes there is a final optimize.

My idea is, if 8 DIHs use 8 CPUs then I have 8 CPUs left for merging
(maxIndexingThreads/maxMergeAtOnce/mergeFactor).
It should do no commit, no optimize.
ramBufferSizeMB is high because I have plenty of RAM and I want make use the 
speed of RAM.
segmentsPerTier is high to reduce merging.

But somewhere is a misconfiguration because indexing gets stalled.

Any idea what's going wrong?


Bernd






How to configure solr while having Apostrophes in fields

2016-07-27 Thread nitin.garg88
When i search for "plato" it return me all records with
"plato,platos,plato's"
When i search for "platos" it return me all records with "platos,plato's"
When i search for "plato's" it return me all records with "platos,plato's"

Please suggest me how to configure schema.xml .Below is my "text" setting in
schema.xml

  
  



















  
  







  


Thanks in advance !




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-configure-solr-while-having-Apostrophes-in-fields-tp4289196.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Query Solr

2016-07-27 Thread Kostas
There are some examples on the web for this:
http://yonik.com/solr/query-syntax/
http://stackoverflow.com/questions/634765/using-or-and-not-in-solr-query
If you are using .NET, maybe also try SolrNet.

Maybe those help.





-Original Message-
From: Hardika Catur S [mailto:hardika.sa...@solusi247.com.INVALID] 
Sent: Wednesday, July 27, 2016 10:24 AM
To: solr-user@lucene.apache.org
Subject: Query Solr

Hi,

I will create a query multiple collections  in solr where query in mysql like 
this "SELECT colection1.field_colection1 FROM colection1 WHERE
colection1.field_colection1 NOT IN (SELECT colection2.field_colection2 FROM 
colection2);".

But I find it difficult for create that query.  Please help me to find a 
solution on.

Thanks,
Hardika CS.



Re: The Query Elevation Component

2016-07-27 Thread Alessandro Benedetti
Hi Ryan,
can you explain this ?
" I'd like the search request to search multiple
fields, but only elevate if the query is found in one of the fields."

You mean, that you want to apply the elevation component only if the user
selected a particular field in the query ?
If i remember well, you have the possibility of associate a list of
documents to each query you prefer in the elevation file.

But maybe I misunderstood your question, are you actually thinking to boost
the results only if they have a certain match in a particular field ?
Because maybe you are looking for the classic edismax with different field
boosting instead than the query elevation component.
Let us know and we can help you better!

Cheers

On Wed, Jul 27, 2016 at 4:49 AM, Ryan Yacyshyn 
wrote:

> Hi everyone,
>
> I'm reading the docs on the query elevation component and some questions
> came up:
>
> Can I specify a field that the elevate component will look at, such as only
> looking at the title field? My search handler (using eDisMax) is searching
> across multiple fields, but if I only want the elevate component to look at
> one field, is this possible? I'd like the search request to search multiple
> fields, but only elevate if the query is found in one of the fields.
>
> Also, is there a recommended way to analyze the query? For example, when
> using the queryFieldType parameter, I'd think I'd only want to use the
> KeywordTokenizer and maybe lowercasing.
>
> Thanks,
> Ryan
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr 5.5.2 mm parameter not working the same

2016-07-27 Thread elisabeth benoit
oh sorry wrote too fast. had to change the defaultOperator to OR.

Elisabeth

2016-07-27 10:11 GMT+02:00 elisabeth benoit :

>
> Hello,
>
> We are migrating from solr 4.10.1 to solr 5.5.2, and it seems that the mm
> parameter is not working the same anymore.
>
> In fact, as soon as there is a word not in the index in the query, no
> matter what mm value I send, I get no answer as if my query is a pure AND
> query.
>
> Does anyone have a clue?
>
> Best regards,
> Elisabeth
>
>


SOLR-7036 - Faster method for group.facet - new patch for trunk

2016-07-27 Thread danny teichthal
Hi,
SOLR-7036 introduced a new faster method for group.facet, which uses
UnInvertedField.
It was patched for version 4.x.
Over the last week, my colleague uploaded a new patch that work against the
trunk.

We would really appreciate if anyone could take a look at it and give us
some feedback about it.
Full details and performance tests results were also added to the JIRA
issue.

We are willing to work at it and if possible backport it to an older branch.

Link:
https://issues.apache.org/jira/browse/SOLR-7036


Thanks in advance,


Solr 5.5.2 mm parameter not working the same

2016-07-27 Thread elisabeth benoit
Hello,

We are migrating from solr 4.10.1 to solr 5.5.2, and it seems that the mm
parameter is not working the same anymore.

In fact, as soon as there is a word not in the index in the query, no
matter what mm value I send, I get no answer as if my query is a pure AND
query.

Does anyone have a clue?

Best regards,
Elisabeth


Query Solr

2016-07-27 Thread Hardika Catur S

Hi,

I will create a query multiple collections  in solr where query in mysql 
like this
"SELECT colection1.field_colection1 FROM colection1 WHERE 
colection1.field_colection1 NOT IN (SELECT colection2.field_colection2 
FROM colection2);".


But I find it difficult for create that query.  Please help me to find a 
solution on.


Thanks,
Hardika CS.