Re: Getting error while excuting full import

2017-04-18 Thread ankur.168
Thanks for enlightening,  Shawn :)

I thought DIH does parallel db request for all the entities defined in a
document. 

I do believe that DIH is easier to use that's why I am trying to find a way
to use this in my current system. But as I explained above since I have so
many sub entities,each returns list of response which will be joined in to
parent. for more than 2 lacs document, full import is taking forever.

What I am looking for is a way to speed up my full import using DIH only. To
achieve this I tried to split the document in 2 and do full import
parallely. but with this approach latest import overrides other document
indexed data, since unique key(property_id) is same for both documents. 

One way I could think of is to keep document in different core which will
maintain different index files and merge the search results from both cores
while performing search on indexed data. But is this a good approach?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-error-while-excuting-full-import-tp4329153p4330704.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: extract multi-features for one solr feature extractor in solr learning to rank

2017-04-18 Thread Jianxiong Dong
Hi, Michael,
 Thank for very valuable feedbacks.

> You can pass in different params in the
> features.json config for each feature, even though they use the same
> feature class.
I used this idea to extract some features in this paper
(https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/letor3.pdf)
e.g.
Table 2 (1-15) features are just  term features in various forms.

{
"store" : "MyFeatureStore",
"name" : "term_count_1",
"class" : "com.apache.solr.ltr.feature.TermCountFeature",
"params" : {
   "field" : "a_text",
   "terms" : "${user_terms}",
   "method"  : "1"
}
  },

{
"store" : "MyFeatureStore",
"name" : "term_count_2",
"class" : "com.apache.solr.ltr.feature.TermCountFeature",
"params" : {
   "field" : "a_text",
   "terms" : "${user_terms}",
   "method"  : "2"
}
  },

where method id corresponds to features on Table 2 (1-15).  Although
those features share the same class,  the differences are minor.  In
product deployment, this overhead may not be an issue. After feature
selection, probably only a small number of features are useful.

Another use case:
use convolution neural network or LSTM to extract embedded feature
vector for  both query and document, where dimension of the embedded
feature vectors should be 50-100. Then we feed those features into
learning-to-rank models.

> Your performance point about 100 features vs 1 feature is true,
> and pull requests to improve the plugin's performance and usability would
I will do some performance benchmark for some user cases to justify
whether supporting new multi-features for one feature class is worthy.
If yes, I will share the results and create pull request.

Thanks

Jianxiong

On 4/18/17, Michael Nilsson  wrote:
> Hi Jianxiong,
>
> What you say is true.  If you want 100 different feature values extracted,
> you need to specify 100 different features in the
> features.json config so that there is a direct mapping of features in and
> features out.  However, you more than likely need
> to only implement 1 feature class that you will use for those 100 feature
> values.  You can pass in different params in the
> features.json config for each feature, even though they use the same
> feature class.  In some cases you might be able to
> just have 1 feature output 1 value that changes per document, if you can
> collapse those features together.  This 2nd option
> may or may not work for you depending on your data, what you are trying to
> bucket, and what algorithm you are trying to
> use because not all algorithms can easily handle this case.  To illustrate:
>
>
> *A) Multiple binary features using the same 1 class*
> {
> "name" : "isProductCheap",
> "class" : "org.apache.solr.ltr.feature.SolrFeature",
> "params" : {
>   "fq": [ "price:[0 TO 100]" ]
> }
> },{
> "name" : "isProductExpensive",
> "class" : "org.apache.solr.ltr.feature.SolrFeature",
> "params" : {
>   "fq": [ "price:[101 TO 1000]" ]
> }
> },{
> "name" : "isProductCrazyExpensive",
> "class" : "org.apache.solr.ltr.feature.SolrFeature",
> "params" : {
>   "fq": [ "price:[1001 TO *]" ]
> }
> }
>
>
> *B) 1 feature that outputs different values (some algorithms don't handle
> discrete features well)*
> {
> "name" : "productPricePoint",
> "class" : "org.apache.solr.ltr.feature.MyPricePointFeature",
> "params" : {
>
>   // Either hard code price map in MyPricePointFeature.java, or
>   // pass it in through params for flexible customization,
>   // and return different values for cheap, expensive, and
> crazyExpensive
>
> }
> }
>
> The 2 options above satisfy most use cases, which is what we were
> targeting.
> In my specific use case, I opted for option A,
> and wrote a simple script that generates the features.json so I wouldn't
> have to write 100 similar features by hand.  You
> also mentioned that you want to extract features sparsely.  You can change
> the configuration of the Feature Transformer
> 
>
> to return features that actually triggered in a sparse format
> .
> Your performance point about 100 features vs 1 feature is true,
> and pull requests to improve the plugin's performance and usability would
> be more than welcome!
>
> -Michael
>
>
>
> On Fri, Apr 14, 2017 at 12:51 PM, Jianxiong Dong 
> wrote:
>
>> Hi,
>> I found that solr learning-to-rank (LTR) supports only ONE feature
>> for a given feature extractor.
>>
>> See interface:
>>
>> https://github.com/apache/lucene-solr/blob/master/solr/
>> contrib/ltr/src/java/org/apache/solr/ltr/feature/Feature.java
>>
>> Line (281, 282) (in FeatureScorer)
>> @Override
>>   public abstract 

Re: Security.json file caused Solr to stop working

2017-04-18 Thread Zheng Lin Edwin Yeo
This was due to a missing comma in the JSON file.

{
"authentication":{
   "blockUnknown": false,
   "class":"solr.BasicAuthPlugin",
   "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
},
"authorization":{
   "class":"solr.RuleBasedAuthorizationPlugin",
   "permissions":[{"name":"security-edit",
  "role":"admin"}],
   "user-role":{"solr":"admin"}
}}


Regards,
Edwin

On 10 April 2017 at 16:34, Zheng Lin Edwin Yeo  wrote:

> This is the error message that I get.
>
> 2017-04-10 08:30:05.766 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter
> Could not start Solr. Check solr/home property and the logs
> 2017-04-10 08:30:05.779 ERROR (main) [   ] o.a.s.c.SolrCore 
> null:java.lang.ClassCastException:
> java.lang.String cannot be cast to java.util.Map
> at org.apache.solr.common.cloud.ZkStateReader.getSecurityProps(
> ZkStateReader.java:908)
> at org.apache.solr.common.cloud.ZkStateReader.
> createClusterStateWatchersAndUpdate(ZkStateReader.java:433)
> at org.apache.solr.cloud.ZkController.init(ZkController.java:672)
> at org.apache.solr.cloud.ZkController.(ZkController.java:419)
> at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:112)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:465)
> at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(
> SolrDispatchFilter.java:235)
> at org.apache.solr.servlet.SolrDispatchFilter.init(
> SolrDispatchFilter.java:167)
> at org.eclipse.jetty.servlet.FilterHolder.initialize(
> FilterHolder.java:137)
> at org.eclipse.jetty.servlet.ServletHandler.initialize(
> ServletHandler.java:873)
> at org.eclipse.jetty.servlet.ServletContextHandler.startContext(
> ServletContextHandler.java:349)
> at org.eclipse.jetty.webapp.WebAppContext.startWebapp(
> WebAppContext.java:1404)
> at org.eclipse.jetty.webapp.WebAppContext.startContext(
> WebAppContext.java:1366)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doStart(ContextHandler.java:778)
> at org.eclipse.jetty.servlet.ServletContextHandler.doStart(
> ServletContextHandler.java:262)
> at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:520)
> at org.eclipse.jetty.util.component.AbstractLifeCycle.
> start(AbstractLifeCycle.java:68)
> at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(
> StandardStarter.java:41)
> at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(
> AppLifeCycle.java:188)
> at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(
> DeploymentManager.java:499)
> at org.eclipse.jetty.deploy.DeploymentManager.addApp(
> DeploymentManager.java:147)
> at org.eclipse.jetty.deploy.providers.ScanningAppProvider.
> fileAdded(ScanningAppProvider.java:180)
> at org.eclipse.jetty.deploy.providers.WebAppProvider.
> fileAdded(WebAppProvider.java:458)
> at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(
> ScanningAppProvider.java:64)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
> at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
> at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
> at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
> at org.eclipse.jetty.util.component.AbstractLifeCycle.
> start(AbstractLifeCycle.java:68)
> at org.eclipse.jetty.deploy.providers.ScanningAppProvider.
> doStart(ScanningAppProvider.java:150)
> at org.eclipse.jetty.util.component.AbstractLifeCycle.
> start(AbstractLifeCycle.java:68)
> at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(
> DeploymentManager.java:561)
> at org.eclipse.jetty.deploy.DeploymentManager.doStart(
> DeploymentManager.java:236)
> at org.eclipse.jetty.util.component.AbstractLifeCycle.
> start(AbstractLifeCycle.java:68)
> at org.eclipse.jetty.util.component.ContainerLifeCycle.
> start(ContainerLifeCycle.java:131)
> at org.eclipse.jetty.server.Server.start(Server.java:422)
> at org.eclipse.jetty.util.component.ContainerLifeCycle.
> doStart(ContainerLifeCycle.java:113)
> at org.eclipse.jetty.server.handler.AbstractHandler.
> doStart(AbstractHandler.java:61)
> at org.eclipse.jetty.server.Server.doStart(Server.java:389)
> at org.eclipse.jetty.util.component.AbstractLifeCycle.
> start(AbstractLifeCycle.java:68)
> at org.eclipse.jetty.xml.XmlConfiguration$1.run(
> XmlConfiguration.java:1516)
> at java.security.AccessController.doPrivileged(Native Method)
> at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1441)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.eclipse.jetty.start.Main.invokeMain(Main.java:214)
> at org.eclipse.jetty.start.Main.start(Main.java:457)
> at org.eclipse.jetty.start.Main.main(Main.java:75)
>
>
> Regards,
> Edwin
>
> On 10 

Re: prefix facet performance

2017-04-18 Thread Maria Muslea
Hmmm, not sure. Probably in the range of 100K-500K.

Before writing the email I was just looking at:
http://yonik.com/facet-performance/

Wow, using facet.method=enum makes a big difference. I will read on it to
understand what it does.

Thank you so much.

Maria

On Tue, Apr 18, 2017 at 5:21 PM, Yonik Seeley  wrote:

> How many unique values in the index?
> You could try facet.method=enum
>
> -Yonik
>
>
> On Tue, Apr 18, 2017 at 8:16 PM, Maria Muslea 
> wrote:
> > Hi,
> >
> > I have ~40K documents in SOLR (not many) and a multivalued facet field
> that
> > contains at least 2K values per document.
> >
> > The values of the facet field look like: A/B, A/C, A/D, C/E, M/F, etc,
> and
> > I use facet.prefix.
> >
> > q=*:*=0=true=concept=A/
> >
> >
> > with "concept" defined as:
> >
> >
> > 
> >
> >
> > This generates the output that I am looking for, but it takes more than
> 10
> > seconds per query.
> >
> >
> > Is there any way that I could improve the facet query performance for
> this
> > example?
> >
> >
> > Thank you,
> >
> > Maria
>


Re: prefix facet performance

2017-04-18 Thread Yonik Seeley
How many unique values in the index?
You could try facet.method=enum

-Yonik


On Tue, Apr 18, 2017 at 8:16 PM, Maria Muslea  wrote:
> Hi,
>
> I have ~40K documents in SOLR (not many) and a multivalued facet field that
> contains at least 2K values per document.
>
> The values of the facet field look like: A/B, A/C, A/D, C/E, M/F, etc, and
> I use facet.prefix.
>
> q=*:*=0=true=concept=A/
>
>
> with "concept" defined as:
>
>
> 
>
>
> This generates the output that I am looking for, but it takes more than 10
> seconds per query.
>
>
> Is there any way that I could improve the facet query performance for this
> example?
>
>
> Thank you,
>
> Maria


prefix facet performance

2017-04-18 Thread Maria Muslea
Hi,

I have ~40K documents in SOLR (not many) and a multivalued facet field that
contains at least 2K values per document.

The values of the facet field look like: A/B, A/C, A/D, C/E, M/F, etc, and
I use facet.prefix.

q=*:*=0=true=concept=A/


with "concept" defined as:





This generates the output that I am looking for, but it takes more than 10
seconds per query.


Is there any way that I could improve the facet query performance for this
example?


Thank you,

Maria


Re: AnalyzingInfixSuggester performance

2017-04-18 Thread Michael McCandless
It also indexes edge ngrams for short sequences (e.g. a*, b*, etc.) and
switches to ordinary PrefixQuery for longer sequences, and does some work
to at search time to do the "infixing".

But yeah otherwise that's it.

If your ranking at lookup isn't exactly matching the weight, but "roughly"
has some correlation to it, you could still use the fast early termination,
except collect deeper than just the top N to ensure you likely found the
best hits according to your ranking function.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Apr 18, 2017 at 4:35 PM, OTH  wrote:

> I see.  I had actually overlooked the fact that Suggester provides a
> 'weightField', and I could possibly use that in my case instead of the
> regular Solr index with bq.
>
> So if I understand then - the main advantage of using the
> AnalyzingInfixSuggester instead of a regular Solr index (since both are
> using standard Lucene?) is that the AInfixSuggester does sorting at
> index-time using the weightField?  So it's only ever advantageous to use
> this Suggester if you need sorting based on a field?
>
> Thanks
>
> On Tue, Apr 18, 2017 at 2:20 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
> > AnalyzingInfixSuggester uses index-time sort, to sort all postings by the
> > suggest weight, so that lookup, as long as your sort by the suggest
> weight
> > is extremely fast.
> >
> > But if you need to rank at lookup time by something not "congruent" with
> > the index-time sort then you lose that benefit.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > On Sun, Apr 16, 2017 at 11:46 AM, OTH  wrote:
> >
> > > Hello,
> > >
> > > From what I understand, the AnalyzingInfixSuggester is using a simple
> > > Lucene query; so I was wondering, how then would this suggester have
> > better
> > > performance than using a simple Solr 'select' query on a regular Solr
> > index
> > > (with an asterisk placed at the start and end of the query string).  I
> > > could understand why say an FST based suggester would be faster, but I
> > > wanted to confirm if that indeed is the case with
> > AnalyzingInfixSuggester.
> > >
> > > One reason I ask is:
> > > I needed the results to be boosted based on the value of another field;
> > > e.g., if a user in the UK is searching for cities, then I'd need the
> > cities
> > > which are in the UK to be boosted.  I was able to do this with a
> regular
> > > Solr index by adding something like these parameters:
> > > defType=edismax=country:UK^2.0
> > >
> > > However, I'm not sure if this is possible with the Suggester.
> Moreover -
> > > other than the 'country' field above, there are other fields as well
> > which
> > > I need to be returned with the results.  Since the Suggester seems to
> > only
> > > allow one additional field, called 'payload', I'm able to do this by
> > > putting the values of all the other fields into a JSON and then placing
> > > that into the 'payload' field - however, I don't know if it would be
> > > possible then to incorporate the boosting mechanism I showed above.
> > >
> > > So I was thinking of just using a regular Solr index instead of the
> > > Suggester; I wanted to confirm, what if any is the performance
> > improvement
> > > in using the AnalyzingInfixSuggester over using a regular index?
> > >
> > > Much thanks
> > >
> >
>


Re: Getting error while excuting full import

2017-04-18 Thread Mikhail Khludnev
Ok. I've checked AbstractSqlEntityProcessorTestCase.
Please make the next attempt with

where="PROPERTY_ID=propertiesList.PROPERTY_ID"


On Tue, Apr 18, 2017 at 4:35 PM, ankur.168  wrote:

> Yes, both column names are same. But if we just use property_id=property_id
> in child entity, then how zipper gets to know which child document to merge
> with which parent?
>
> Any how I just tried with ur suggested where condition which result in
> arrayindexoutofbound exception, here are the logs
>
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:561)
> at
> org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:414)
> ... 36 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.solr.handler.dataimport.VariableResolver.
> resolve(VariableResolver.java:110)
> at
> org.apache.solr.handler.dataimport.ContextImpl.
> resolve(ContextImpl.java:250)
> at org.apache.solr.handler.dataimport.Zipper.onNewParent(
> Zipper.java:106)
> at
> org.apache.solr.handler.dataimport.EntityProcessorBase.init(
> EntityProcessorBase.java:63)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.
> init(SqlEntityProcessor.java:52)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(
> EntityProcessorWrapper.java:75)
> at
> org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:433)
> at
> org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:516)
> ... 37 more
>
> Thanks,
> --Ankur
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Getting-error-while-excuting-full-import-tp4329153p4330498.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


Re: AnalyzingInfixSuggester performance

2017-04-18 Thread OTH
I see.  I had actually overlooked the fact that Suggester provides a
'weightField', and I could possibly use that in my case instead of the
regular Solr index with bq.

So if I understand then - the main advantage of using the
AnalyzingInfixSuggester instead of a regular Solr index (since both are
using standard Lucene?) is that the AInfixSuggester does sorting at
index-time using the weightField?  So it's only ever advantageous to use
this Suggester if you need sorting based on a field?

Thanks

On Tue, Apr 18, 2017 at 2:20 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> AnalyzingInfixSuggester uses index-time sort, to sort all postings by the
> suggest weight, so that lookup, as long as your sort by the suggest weight
> is extremely fast.
>
> But if you need to rank at lookup time by something not "congruent" with
> the index-time sort then you lose that benefit.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, Apr 16, 2017 at 11:46 AM, OTH  wrote:
>
> > Hello,
> >
> > From what I understand, the AnalyzingInfixSuggester is using a simple
> > Lucene query; so I was wondering, how then would this suggester have
> better
> > performance than using a simple Solr 'select' query on a regular Solr
> index
> > (with an asterisk placed at the start and end of the query string).  I
> > could understand why say an FST based suggester would be faster, but I
> > wanted to confirm if that indeed is the case with
> AnalyzingInfixSuggester.
> >
> > One reason I ask is:
> > I needed the results to be boosted based on the value of another field;
> > e.g., if a user in the UK is searching for cities, then I'd need the
> cities
> > which are in the UK to be boosted.  I was able to do this with a regular
> > Solr index by adding something like these parameters:
> > defType=edismax=country:UK^2.0
> >
> > However, I'm not sure if this is possible with the Suggester.  Moreover -
> > other than the 'country' field above, there are other fields as well
> which
> > I need to be returned with the results.  Since the Suggester seems to
> only
> > allow one additional field, called 'payload', I'm able to do this by
> > putting the values of all the other fields into a JSON and then placing
> > that into the 'payload' field - however, I don't know if it would be
> > possible then to incorporate the boosting mechanism I showed above.
> >
> > So I was thinking of just using a regular Solr index instead of the
> > Suggester; I wanted to confirm, what if any is the performance
> improvement
> > in using the AnalyzingInfixSuggester over using a regular index?
> >
> > Much thanks
> >
>


Re: How to change stateFomat to 2

2017-04-18 Thread Erick Erickson
There should be no need to set CLUSTERPROP more than once, it's a
characteristic of your entire, well, cluster. See clusterprops.json in
your admin UI>>tree view.

Best,
Erick

On Tue, Apr 18, 2017 at 10:21 AM, Manohar Sripada  wrote:
> Thanks Erick!
> state.json exists for each collection in the "tree" view of admin UI. So,
> that format is set to 2. I will call the CLUSTERPROP collections API too
> and set legacyCloud=false whenever I create a collection.
>
> Thanks
>
> On Tue, Apr 18, 2017 at 8:50 PM, Erick Erickson 
> wrote:
>
>> clusterstate.json will exist, it just should be empty if you're using
>> state format 2.
>>
>> Note: if you have "state.json" files under each collections in ZK (see
>> the "tree" view in the admin UI), then you _are_ in the format 2
>> world. However, for Solr 5.x, there'a an obscure property
>> "legacyCloud" that, if true will allow orphan replicas to reconstruct
>> themselves in clusterstate.json even if the format is 2. The condition
>> is that you have orphan replicas out there (where you've deleted the
>> collection but for some reason were unable to delete the replica, say
>> the Solr node hosting some replicas was down and you restarted it).
>> When Solr starts up, this orphan reconstructs itself in
>> clusterstate.json, where it's ignored.
>>
>> So you should set legacyCloud=false using the CLUSTERPROP (IIRC)
>> collections API call. You can also just delete the _data_ from
>> clusterstate.json. ASSUMING you're in format 2.
>>
>> If you're really in format 1, then see MIGRATESTATEFORMAT here:
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#
>> CollectionsAPI-MIGRATESTATEFORMAT:MigrateClusterState
>>
>> Best,
>> Erick
>>
>> On Tue, Apr 18, 2017 at 8:03 AM, Manohar Sripada 
>> wrote:
>> > After deleting a collection through Collection API, the data is not
>> getting
>> > deleted from clusterstate.json. Based on this discussion
>> > > gets-stuck-on-node-restart-td4311994.html>,
>> > it seems clusterstate.json shouldn't be there for Solr 5.x (I am using
>> > 5.2.1). It also mentions that stateFormat should be set to 2.
>> >
>> > How to set stateFormat to 2 while calling the Collection API? Can I
>> default
>> > it to 2 during the setup itself so that I dont need to set it up for each
>> > and every collection creation?
>> >
>> > Thanks in Advance!
>>


Re: SolrJ and Streaming

2017-04-18 Thread Joe Obernberger
Thank you Joel; exactly what I needed!  Just had to change it to use 
CloudSolrStream instead.

Much appreciated!

-Joe


On 4/18/2017 3:21 PM, Joel Bernstein wrote:

Are you trying to send a streaming expression using SolrJ?

If you are you can send the expression with the SolrStream. for example:

params = new ModifiableSolrParams();
params.set("expr", expr);
params.set("qt", "/stream");
SolrStream stream = new SolrStream(url, paramsLoc);

try {

stream.open();

while(true) {

   Tuple tuple = stream.read();

   if(tuple.EOF) {

   break;

   }

}

} finally {

   stream.close();

}



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Apr 18, 2017 at 2:33 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:


Hi All - any examples of using solrJ and streaming expressions available?
Like calling UpdateStream from solrJ?

Thank you!

-Joe






Re: SolrJ and Streaming

2017-04-18 Thread Joel Bernstein
paramsLoc in my last email should be params

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Apr 18, 2017 at 3:21 PM, Joel Bernstein  wrote:

> Are you trying to send a streaming expression using SolrJ?
>
> If you are you can send the expression with the SolrStream. for example:
>
> params = new ModifiableSolrParams();
> params.set("expr", expr);
> params.set("qt", "/stream");
> SolrStream stream = new SolrStream(url, paramsLoc);
>
> try {
>
>stream.open();
>
>while(true) {
>
>   Tuple tuple = stream.read();
>
>   if(tuple.EOF) {
>
>   break;
>
>   }
>
>}
>
> } finally {
>
>   stream.close();
>
> }
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Apr 18, 2017 at 2:33 PM, Joe Obernberger <
> joseph.obernber...@gmail.com> wrote:
>
>> Hi All - any examples of using solrJ and streaming expressions
>> available?  Like calling UpdateStream from solrJ?
>>
>> Thank you!
>>
>> -Joe
>>
>>
>


Re: SolrJ and Streaming

2017-04-18 Thread Joel Bernstein
Are you trying to send a streaming expression using SolrJ?

If you are you can send the expression with the SolrStream. for example:

params = new ModifiableSolrParams();
params.set("expr", expr);
params.set("qt", "/stream");
SolrStream stream = new SolrStream(url, paramsLoc);

try {

   stream.open();

   while(true) {

  Tuple tuple = stream.read();

  if(tuple.EOF) {

  break;

  }

   }

} finally {

  stream.close();

}



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Apr 18, 2017 at 2:33 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Hi All - any examples of using solrJ and streaming expressions available?
> Like calling UpdateStream from solrJ?
>
> Thank you!
>
> -Joe
>
>


SolrJ and Streaming

2017-04-18 Thread Joe Obernberger
Hi All - any examples of using solrJ and streaming expressions 
available?  Like calling UpdateStream from solrJ?


Thank you!

-Joe



Re: How to change stateFomat to 2

2017-04-18 Thread Manohar Sripada
Thanks Erick!
state.json exists for each collection in the "tree" view of admin UI. So,
that format is set to 2. I will call the CLUSTERPROP collections API too
and set legacyCloud=false whenever I create a collection.

Thanks

On Tue, Apr 18, 2017 at 8:50 PM, Erick Erickson 
wrote:

> clusterstate.json will exist, it just should be empty if you're using
> state format 2.
>
> Note: if you have "state.json" files under each collections in ZK (see
> the "tree" view in the admin UI), then you _are_ in the format 2
> world. However, for Solr 5.x, there'a an obscure property
> "legacyCloud" that, if true will allow orphan replicas to reconstruct
> themselves in clusterstate.json even if the format is 2. The condition
> is that you have orphan replicas out there (where you've deleted the
> collection but for some reason were unable to delete the replica, say
> the Solr node hosting some replicas was down and you restarted it).
> When Solr starts up, this orphan reconstructs itself in
> clusterstate.json, where it's ignored.
>
> So you should set legacyCloud=false using the CLUSTERPROP (IIRC)
> collections API call. You can also just delete the _data_ from
> clusterstate.json. ASSUMING you're in format 2.
>
> If you're really in format 1, then see MIGRATESTATEFORMAT here:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#
> CollectionsAPI-MIGRATESTATEFORMAT:MigrateClusterState
>
> Best,
> Erick
>
> On Tue, Apr 18, 2017 at 8:03 AM, Manohar Sripada 
> wrote:
> > After deleting a collection through Collection API, the data is not
> getting
> > deleted from clusterstate.json. Based on this discussion
> >  gets-stuck-on-node-restart-td4311994.html>,
> > it seems clusterstate.json shouldn't be there for Solr 5.x (I am using
> > 5.2.1). It also mentions that stateFormat should be set to 2.
> >
> > How to set stateFormat to 2 while calling the Collection API? Can I
> default
> > it to 2 during the setup itself so that I dont need to set it up for each
> > and every collection creation?
> >
> > Thanks in Advance!
>


Re: extract multi-features for one solr feature extractor in solr learning to rank

2017-04-18 Thread Michael Nilsson
Hi Jianxiong,

What you say is true.  If you want 100 different feature values extracted,
you need to specify 100 different features in the
features.json config so that there is a direct mapping of features in and
features out.  However, you more than likely need
to only implement 1 feature class that you will use for those 100 feature
values.  You can pass in different params in the
features.json config for each feature, even though they use the same
feature class.  In some cases you might be able to
just have 1 feature output 1 value that changes per document, if you can
collapse those features together.  This 2nd option
may or may not work for you depending on your data, what you are trying to
bucket, and what algorithm you are trying to
use because not all algorithms can easily handle this case.  To illustrate:


*A) Multiple binary features using the same 1 class*
{
"name" : "isProductCheap",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
  "fq": [ "price:[0 TO 100]" ]
}
},{
"name" : "isProductExpensive",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
  "fq": [ "price:[101 TO 1000]" ]
}
},{
"name" : "isProductCrazyExpensive",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
  "fq": [ "price:[1001 TO *]" ]
}
}


*B) 1 feature that outputs different values (some algorithms don't handle
discrete features well)*
{
"name" : "productPricePoint",
"class" : "org.apache.solr.ltr.feature.MyPricePointFeature",
"params" : {

  // Either hard code price map in MyPricePointFeature.java, or
  // pass it in through params for flexible customization,
  // and return different values for cheap, expensive, and
crazyExpensive

}
}

The 2 options above satisfy most use cases, which is what we were targeting.
In my specific use case, I opted for option A,
and wrote a simple script that generates the features.json so I wouldn't
have to write 100 similar features by hand.  You
also mentioned that you want to extract features sparsely.  You can change
the configuration of the Feature Transformer


to return features that actually triggered in a sparse format
.
Your performance point about 100 features vs 1 feature is true,
and pull requests to improve the plugin's performance and usability would
be more than welcome!

-Michael



On Fri, Apr 14, 2017 at 12:51 PM, Jianxiong Dong 
wrote:

> Hi,
> I found that solr learning-to-rank (LTR) supports only ONE feature
> for a given feature extractor.
>
> See interface:
>
> https://github.com/apache/lucene-solr/blob/master/solr/
> contrib/ltr/src/java/org/apache/solr/ltr/feature/Feature.java
>
> Line (281, 282) (in FeatureScorer)
> @Override
>   public abstract float score() throws IOException;
>
> I have a user case: given a , I like to extract multiple
> features (e.g.  100 features.  In the current framework,  I have to
> define 100 features in feature.json. Also more cost for scored doc
> iterations).
>
> I would like to have an interface:
>
> public abstract Map score() throws IOException;
>
> It helps support sparse vector feature.
>
> Can anybody provide an insight?
>
> Thanks
>
> Jianxiong
>


Re: How to change stateFomat to 2

2017-04-18 Thread Erick Erickson
clusterstate.json will exist, it just should be empty if you're using
state format 2.

Note: if you have "state.json" files under each collections in ZK (see
the "tree" view in the admin UI), then you _are_ in the format 2
world. However, for Solr 5.x, there'a an obscure property
"legacyCloud" that, if true will allow orphan replicas to reconstruct
themselves in clusterstate.json even if the format is 2. The condition
is that you have orphan replicas out there (where you've deleted the
collection but for some reason were unable to delete the replica, say
the Solr node hosting some replicas was down and you restarted it).
When Solr starts up, this orphan reconstructs itself in
clusterstate.json, where it's ignored.

So you should set legacyCloud=false using the CLUSTERPROP (IIRC)
collections API call. You can also just delete the _data_ from
clusterstate.json. ASSUMING you're in format 2.

If you're really in format 1, then see MIGRATESTATEFORMAT here:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-MIGRATESTATEFORMAT:MigrateClusterState

Best,
Erick

On Tue, Apr 18, 2017 at 8:03 AM, Manohar Sripada  wrote:
> After deleting a collection through Collection API, the data is not getting
> deleted from clusterstate.json. Based on this discussion
> ,
> it seems clusterstate.json shouldn't be there for Solr 5.x (I am using
> 5.2.1). It also mentions that stateFormat should be set to 2.
>
> How to set stateFormat to 2 while calling the Collection API? Can I default
> it to 2 during the setup itself so that I dont need to set it up for each
> and every collection creation?
>
> Thanks in Advance!


Re: Innerjoin streaming expressions - Invalid JoinStream error

2017-04-18 Thread Joel Bernstein
Interesting, that inverting the on clause worked. Something is not working
as designed.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Apr 18, 2017 at 11:09 AM, Dominique Bejean <
dominique.bej...@eolya.fr> wrote:

> Done
> https://issues.apache.org/jira/browse/SOLR-10512
>
> Regards.
>
> Dominique
>
>
> Le mar. 18 avr. 2017 à 14:51, Joel Bernstein  a écrit
> :
>
> >  I looked through the test cases I don't think we have this covered
> exactly
> > as it's written.  Can you log a jira for this?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Apr 18, 2017 at 6:33 AM, Dominique Bejean <
> > dominique.bej...@eolya.fr
> > > wrote:
> >
> > > Hi,
> > >
> > > I don not understand what I am doing wrong il this simple query.
> > >
> > > curl --data-urlencode 'expr=innerJoin(
> > > search(books,
> > >q="*:*",
> > >fl="id",
> > >sort="id asc"),
> > > searchreviews,
> > >q="*:*",
> > >fl="id_book_s",
> > >sort="id_book_s asc"),
> > > on="id=id_books_s"
> > > )' http://localhost:8983/solr/books/stream
> > >
> > > {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all incoming
> > > stream comparators (sort) must be a superset of this stream's
> > > equalitor.","EOF":true}]}}
> > >
> > >
> > > It is tottaly similar to the documentation example
> > > 
> > > innerJoin(
> > >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> > >   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
> > >   on="personId=ownerId"
> > > )
> > >
> > >
> > >
> > > Queries on each collection give :
> > >
> > > $ curl --data-urlencode 'expr=search(books,
> > >q="*:*",
> > >fl="id, title_s, pubyear_i",
> > >sort="pubyear_i asc",
> > >qt="/export")'
> > > http://localhost:8983/solr/books/stream
> > >
> > > {
> > >   "result-set": {
> > > "docs": [
> > >   {
> > > "title_s": "Friends",
> > > "pubyear_i": 1994,
> > > "id": "book2"
> > >   },
> > >   {
> > > "title_s": "The Way of Kings",
> > > "pubyear_i": 2010,
> > > "id": "book1"
> > >   },
> > >   {
> > > "EOF": true,
> > > "RESPONSE_TIME": 16
> > >   }
> > > ]
> > >   }
> > > }
> > >
> > >
> > > $ curl --data-urlencode 'expr=search(reviews,
> > >q="author_s:d*",
> > >fl="id, id_book_s, stars_i,
> > review_dt",
> > >sort="id_book_s asc",
> > >qt="/export")'
> > > http://localhost:8983/solr/reviews/stream
> > >
> > > {
> > >   "result-set": {
> > > "docs": [
> > >   {
> > > "stars_i": 3,
> > > "id": "book1_c2",
> > > "id_book_s": "book1",
> > > "review_dt": "2014-03-15T12:00:00Z"
> > >   },
> > >   {
> > > "stars_i": 4,
> > > "id": "book1_c3",
> > > "id_book_s": "book1",
> > > "review_dt": "2014-12-15T12:00:00Z"
> > >   },
> > >   {
> > > "stars_i": 3,
> > > "id": "book2_c2",
> > > "id_book_s": "book2",
> > > "review_dt": "1994-03-15T12:00:00Z"
> > >   },
> > >   {
> > > "stars_i": 4,
> > > "id": "book2_c3",
> > > "id_book_s": "book2",
> > > "review_dt": "1994-12-15T12:00:00Z"
> > >   },
> > >   {
> > > "EOF": true,
> > > "RESPONSE_TIME": 47
> > >   }
> > > ]
> > >   }
> > > }
> > >
> > >
> > > Can someone help me to find my mistake ?
> > >
> > > Regards
> > >
> > > Dominique
> > > --
> > > Dominique Béjean
> > > 06 08 46 12 43
> > >
> >
> --
> Dominique Béjean
> 06 08 46 12 43
>


Re: Innerjoin streaming expressions - Invalid JoinStream error

2017-04-18 Thread Dominique Bejean
Done
https://issues.apache.org/jira/browse/SOLR-10512

Regards.

Dominique


Le mar. 18 avr. 2017 à 14:51, Joel Bernstein  a écrit :

>  I looked through the test cases I don't think we have this covered exactly
> as it's written.  Can you log a jira for this?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Apr 18, 2017 at 6:33 AM, Dominique Bejean <
> dominique.bej...@eolya.fr
> > wrote:
>
> > Hi,
> >
> > I don not understand what I am doing wrong il this simple query.
> >
> > curl --data-urlencode 'expr=innerJoin(
> > search(books,
> >q="*:*",
> >fl="id",
> >sort="id asc"),
> > searchreviews,
> >q="*:*",
> >fl="id_book_s",
> >sort="id_book_s asc"),
> > on="id=id_books_s"
> > )' http://localhost:8983/solr/books/stream
> >
> > {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all incoming
> > stream comparators (sort) must be a superset of this stream's
> > equalitor.","EOF":true}]}}
> >
> >
> > It is tottaly similar to the documentation example
> > 
> > innerJoin(
> >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> >   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
> >   on="personId=ownerId"
> > )
> >
> >
> >
> > Queries on each collection give :
> >
> > $ curl --data-urlencode 'expr=search(books,
> >q="*:*",
> >fl="id, title_s, pubyear_i",
> >sort="pubyear_i asc",
> >qt="/export")'
> > http://localhost:8983/solr/books/stream
> >
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "title_s": "Friends",
> > "pubyear_i": 1994,
> > "id": "book2"
> >   },
> >   {
> > "title_s": "The Way of Kings",
> > "pubyear_i": 2010,
> > "id": "book1"
> >   },
> >   {
> > "EOF": true,
> > "RESPONSE_TIME": 16
> >   }
> > ]
> >   }
> > }
> >
> >
> > $ curl --data-urlencode 'expr=search(reviews,
> >q="author_s:d*",
> >fl="id, id_book_s, stars_i,
> review_dt",
> >sort="id_book_s asc",
> >qt="/export")'
> > http://localhost:8983/solr/reviews/stream
> >
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "stars_i": 3,
> > "id": "book1_c2",
> > "id_book_s": "book1",
> > "review_dt": "2014-03-15T12:00:00Z"
> >   },
> >   {
> > "stars_i": 4,
> > "id": "book1_c3",
> > "id_book_s": "book1",
> > "review_dt": "2014-12-15T12:00:00Z"
> >   },
> >   {
> > "stars_i": 3,
> > "id": "book2_c2",
> > "id_book_s": "book2",
> > "review_dt": "1994-03-15T12:00:00Z"
> >   },
> >   {
> > "stars_i": 4,
> > "id": "book2_c3",
> > "id_book_s": "book2",
> > "review_dt": "1994-12-15T12:00:00Z"
> >   },
> >   {
> > "EOF": true,
> > "RESPONSE_TIME": 47
> >   }
> > ]
> >   }
> > }
> >
> >
> > Can someone help me to find my mistake ?
> >
> > Regards
> >
> > Dominique
> > --
> > Dominique Béjean
> > 06 08 46 12 43
> >
>
-- 
Dominique Béjean
06 08 46 12 43


How to change stateFomat to 2

2017-04-18 Thread Manohar Sripada
After deleting a collection through Collection API, the data is not getting
deleted from clusterstate.json. Based on this discussion
,
it seems clusterstate.json shouldn't be there for Solr 5.x (I am using
5.2.1). It also mentions that stateFormat should be set to 2.

How to set stateFormat to 2 while calling the Collection API? Can I default
it to 2 during the setup itself so that I dont need to set it up for each
and every collection creation?

Thanks in Advance!


Re: Running Solr6 on Tomcat7

2017-04-18 Thread Shawn Heisey
On 4/18/2017 7:40 AM, rgummadi wrote:
> Is anyone successfull in running Solr6 on Tomcat7. If so can you give
> me some pointers on how you did this?

Running in this way is officially unsupported.  You can most likely do
it, but we strongly recommend that you don't.  You are on your own when
it comes to figuring out HOW to do it.

https://wiki.apache.org/solr/WhyNoWar

The webapp is already extracted as server/solr-webapp/webapp ... Tomcat
should include instructions for installing an app like this.  You will
also need logging jars from server/lib/ext to be installed somewhere
Tomcat can find them.

The following is important enough to say it again:  We recommend that
you do NOT do this.  Even though it probably will work, there's no
guarantee that any container other than the one Solr ships with is going
to run.  The admin UI in particular may not function correctly.  You may
also find that it's difficult to obtain help for a non-standard
install.  Eventually,the plan is to create Solr as a standalone
application that you won't be able to install into Tomcat.

Thanks,
Shawn



Re: Innerjoin streaming expressions - Invalid JoinStream error

2017-04-18 Thread Dominique Bejean
Furthermore, it looks like due to the fact "all incoming stream comparators
(sort) must be a superset of this stream's equalitor" condition, it isn't
possible to sort the stream on a other field such as for instance in my
example pubyear_s (books collection) or review_dt (reviews collection).

Dominique


Le mar. 18 avr. 2017 à 15:28, Dominique Bejean 
a écrit :

> Hi,
>
> I reply to myself
>
> I just had to invert the "on" clause to make it work
>
> curl --data-urlencode 'expr=innerJoin(
> search(books,
>q="*:*",
>fl="id",
>sort="id asc"),
> searchreviews,
>q="*:*",
>fl="id_book_s",
>sort="id_book_s asc"),
> on="id_books_s=id"
> )' http://localhost:8983/solr/books/stream
>
>
> 
>
> {
>   "result-set": {
> "docs": [
>   {
> "title_s": "The Way of Kings",
> "pubyear_i": 2010,
> "stars_i": 5,
> "id": "book1",
> "id_book_s": "book1",
> "review_dt": "2015-01-03T14:30:00Z"
>   },
>   {
> "title_s": "The Way of Kings",
> "pubyear_i": 2010,
> "stars_i": 3,
> "id": "book1",
> "id_book_s": "book1",
> "review_dt": "2014-03-15T12:00:00Z"
>   },
>   {
> "title_s": "The Way of Kings",
> "pubyear_i": 2010,
> "stars_i": 4,
> "id": "book1",
> "id_book_s": "book1",
> "review_dt": "2014-12-15T12:00:00Z"
>   },
>   {
> "title_s": "Friends",
> "pubyear_i": 1994,
> "stars_i": 5,
> "id": "book2",
> "id_book_s": "book2",
> "review_dt": "1995-01-03T14:30:00Z"
>   },
>   {
> "title_s": "Friends",
> "pubyear_i": 1994,
> "stars_i": 3,
> "id": "book2",
> "id_book_s": "book2",
> "review_dt": "1994-03-15T12:00:00Z"
>   },
>   {
> "title_s": "Friends",
> "pubyear_i": 1994,
> "stars_i": 4,
> "id": "book2",
> "id_book_s": "book2",
> "review_dt": "1994-12-15T12:00:00Z"
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 35
>   }
> ]
>   }
> }
>
>
> However, I don't understand the reason as in debug mode I see
> the isValidTupleOrder method should return true in both case.
>
> Regards.
>
> Dominique
>
>
>
>
>
> Le mar. 18 avr. 2017 à 14:51, Joel Bernstein  a
> écrit :
>
>>  I looked through the test cases I don't think we have this covered
>> exactly
>> as it's written.  Can you log a jira for this?
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Tue, Apr 18, 2017 at 6:33 AM, Dominique Bejean <
>> dominique.bej...@eolya.fr
>> > wrote:
>>
>> > Hi,
>> >
>> > I don not understand what I am doing wrong il this simple query.
>> >
>> > curl --data-urlencode 'expr=innerJoin(
>> > search(books,
>> >q="*:*",
>> >fl="id",
>> >sort="id asc"),
>> > searchreviews,
>> >q="*:*",
>> >fl="id_book_s",
>> >sort="id_book_s asc"),
>> > on="id=id_books_s"
>> > )' http://localhost:8983/solr/books/stream
>> >
>> > {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all incoming
>> > stream comparators (sort) must be a superset of this stream's
>> > equalitor.","EOF":true}]}}
>> >
>> >
>> > It is tottaly similar to the documentation example
>> > 
>> > innerJoin(
>> >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
>> >   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
>> >   on="personId=ownerId"
>> > )
>> >
>> >
>> >
>> > Queries on each collection give :
>> >
>> > $ curl --data-urlencode 'expr=search(books,
>> >q="*:*",
>> >fl="id, title_s, pubyear_i",
>> >sort="pubyear_i asc",
>> >qt="/export")'
>> > http://localhost:8983/solr/books/stream
>> >
>> > {
>> >   "result-set": {
>> > "docs": [
>> >   {
>> > "title_s": "Friends",
>> > "pubyear_i": 1994,
>> > "id": "book2"
>> >   },
>> >   {
>> > "title_s": "The Way of Kings",
>> > "pubyear_i": 2010,
>> > "id": "book1"
>> >   },
>> >   {
>> > "EOF": true,
>> > "RESPONSE_TIME": 16
>> >   }
>> > ]
>> >   }
>> > }
>> >
>> >
>> > $ curl --data-urlencode 

Re: Filter Facet Query

2017-04-18 Thread Furkan KAMACI
Hi Alex,

I found the reason, thanks for the help. Facet shows all possible values
including 0.

Could you help on my last question:

I have facet results like:

"", 9
"research",6
"development",3


I want to filter empty string from my facet "" (I don't want to add it to
fq, just filter from facets). How can I do that?

On Tue, Apr 18, 2017 at 11:52 AM, Alexandre Rafalovitch 
wrote:

> Are you saying that all the values in the facet are zero with that
> query? The query you gave seems to be the super-basic faceting code,
> so maybe something super-basic is missing.
>
> E.g.
> *) Did you check that the documents you get back actually have any
> values in that field to facet on?
> *) Did you try making a query just by ID for a document that
> definitely has the value in that field?
> *) Did you do the query with echoParams=all to see that you are not
> having any hidden extra parameters that get appended?
>
> Regards,
>Alex.
>
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 18 April 2017 at 11:43, Furkan KAMACI  wrote:
> > OK, it returns 0 results every time.
> >
> > So,
> >
> > I want to filter out research values with empty string ("") from facet
> > result. How can I do that?
> >
> >
> > On Tue, Apr 18, 2017 at 8:53 AM, Furkan KAMACI 
> > wrote:
> >
> >> First problem is they do not match with main query.
> >>
> >> 18 Nis 2017 Sal, saat 01:54 tarihinde Dave <
> hastings.recurs...@gmail.com>
> >> şunu yazdı:
> >>
> >>> Min.count is what you're looking for to get non 0 facets
> >>>
> >>> > On Apr 17, 2017, at 6:51 PM, Furkan KAMACI 
> >>> wrote:
> >>> >
> >>> > My query:
> >>> >
> >>> > /select?facet.field=research=on=content:test
> >>> >
> >>> > Q1) Facet returns research values with 0 counts which has a research
> >>> value
> >>> > that is not from a document matched by main query (content:test). Is
> >>> that
> >>> > usual?
> >>> >
> >>> > Q2) I want to filter out research values with empty string ("") from
> >>> facet
> >>> > result. How can I do that?
> >>> >
> >>> > Kind Regards,
> >>> > Furkan KAMACI
> >>>
> >>
>


Running Solr6 on Tomcat7

2017-04-18 Thread rgummadi
Is anyone successfull in running Solr6 on Tomcat7. If so can you give me some
pointers on how you did this? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Running-Solr6-on-Tomcat7-tp4330500.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Moving solr home

2017-04-18 Thread Shawn Heisey
On 4/18/2017 7:07 AM, tedsolr wrote:
> Looks like the issues are self inflicted. I have custom start/stop
> scripts that actually specify the solr home directory as a param to
> the start command (start -c -s ...). This was overriding my include
> variable. As for the magical solr.xml file, that's also my doing
> because as part of shutdown the script copies config files where they
> need to go. When I wrote all this 2 years ago I needed the flexibility
> to run multiple nodes on a single host (because requisitioning a
> server took an act of congress).

The service installer script works well for non-Windows systems, and has
one big advantage -- the people on this list will know exactly where
everything is if you are able to tell us the options you used on the
installer.  With custom scripts like what you wrote, we have no idea
where anything is.  The installer script defaults to separating the solr
home from the install dir.

I would strongly recommend using the service installer script if you're
not running on Windows.  It will create a user to run Solr so it's not
running as root.  You can install more than one instance using that
script.  Each one will get its own copy of the binaries.  Although it is
possible to run multiple instances out of one install directory, it does
require custom scripting.  The amount of disk space that separate
install directories use is usually very small compared to the indexes
and modern disk sizes.

Thanks,
Shawn



Re: Getting error while excuting full import

2017-04-18 Thread Shawn Heisey
On 4/18/2017 12:58 AM, ankur.168 wrote:
> Hi Erick,
>
> Thanks for replying, As you suggest I can use solrJ to map RDBMS fetched
> data and index/search it later on. but DIH gives multi db connection for
> full import and other benefits.
> Does solrJ supports this or we need to put efforts to make a multithreaded
> connection pool similar to DIH?

Each DIH handler is single-threaded.  I have no idea what you are
talking about when you mention multi-threaded in conjunction with DIH. 
You can have multiple handlers, and execute all of them in parallel, but
each one can only execute one import at a time, and that import will
only run with one thread.  Within the limitations caused by running
single-threaded, DIH is quite efficient.

With SolrJ, you can do anything the Java language permits you to do,
including very efficient handling of multiple threads, using multiple
databases, and pretty much anything else you can dream up ... but you
must write all the code.

Safety and good performance in a multi-threaded Java program is an art
form.  I hesitate to say that it's *difficult*, but it does create
challenges that may not be trivial to overcome.

The various SolrClient objects from SolrJ are thread-safe, although you
typically must create a custom HttpClient object to use for SolrClient
creation if you're going to be using a lot of threads, because
HttpClient defaults are to only allow a very minimal number of threads.

I've got no experience at all with parent/child documents, so this part
of your troubles is a mystery to me.

Thanks,
Shawn



Re: Getting error while excuting full import

2017-04-18 Thread ankur.168
Yes, both column names are same. But if we just use property_id=property_id
in child entity, then how zipper gets to know which child document to merge
with which parent?

Any how I just tried with ur suggested where condition which result in
arrayindexoutofbound exception, here are the logs

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:561)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 36 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.solr.handler.dataimport.VariableResolver.resolve(VariableResolver.java:110)
at
org.apache.solr.handler.dataimport.ContextImpl.resolve(ContextImpl.java:250)
at 
org.apache.solr.handler.dataimport.Zipper.onNewParent(Zipper.java:106)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:63)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.init(SqlEntityProcessor.java:52)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:75)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:433)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)
... 37 more

Thanks,
--Ankur 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-error-while-excuting-full-import-tp4329153p4330498.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Innerjoin streaming expressions - Invalid JoinStream error

2017-04-18 Thread Dominique Bejean
Hi,

I reply to myself

I just had to invert the "on" clause to make it work

curl --data-urlencode 'expr=innerJoin(
search(books,
   q="*:*",
   fl="id",
   sort="id asc"),
searchreviews,
   q="*:*",
   fl="id_book_s",
   sort="id_book_s asc"),
on="id_books_s=id"
)' http://localhost:8983/solr/books/stream




{
  "result-set": {
"docs": [
  {
"title_s": "The Way of Kings",
"pubyear_i": 2010,
"stars_i": 5,
"id": "book1",
"id_book_s": "book1",
"review_dt": "2015-01-03T14:30:00Z"
  },
  {
"title_s": "The Way of Kings",
"pubyear_i": 2010,
"stars_i": 3,
"id": "book1",
"id_book_s": "book1",
"review_dt": "2014-03-15T12:00:00Z"
  },
  {
"title_s": "The Way of Kings",
"pubyear_i": 2010,
"stars_i": 4,
"id": "book1",
"id_book_s": "book1",
"review_dt": "2014-12-15T12:00:00Z"
  },
  {
"title_s": "Friends",
"pubyear_i": 1994,
"stars_i": 5,
"id": "book2",
"id_book_s": "book2",
"review_dt": "1995-01-03T14:30:00Z"
  },
  {
"title_s": "Friends",
"pubyear_i": 1994,
"stars_i": 3,
"id": "book2",
"id_book_s": "book2",
"review_dt": "1994-03-15T12:00:00Z"
  },
  {
"title_s": "Friends",
"pubyear_i": 1994,
"stars_i": 4,
"id": "book2",
"id_book_s": "book2",
"review_dt": "1994-12-15T12:00:00Z"
  },
  {
"EOF": true,
"RESPONSE_TIME": 35
  }
]
  }
}


However, I don't understand the reason as in debug mode I see
the isValidTupleOrder method should return true in both case.

Regards.

Dominique





Le mar. 18 avr. 2017 à 14:51, Joel Bernstein  a écrit :

>  I looked through the test cases I don't think we have this covered exactly
> as it's written.  Can you log a jira for this?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Apr 18, 2017 at 6:33 AM, Dominique Bejean <
> dominique.bej...@eolya.fr
> > wrote:
>
> > Hi,
> >
> > I don not understand what I am doing wrong il this simple query.
> >
> > curl --data-urlencode 'expr=innerJoin(
> > search(books,
> >q="*:*",
> >fl="id",
> >sort="id asc"),
> > searchreviews,
> >q="*:*",
> >fl="id_book_s",
> >sort="id_book_s asc"),
> > on="id=id_books_s"
> > )' http://localhost:8983/solr/books/stream
> >
> > {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all incoming
> > stream comparators (sort) must be a superset of this stream's
> > equalitor.","EOF":true}]}}
> >
> >
> > It is tottaly similar to the documentation example
> > 
> > innerJoin(
> >   search(people, q=*:*, fl="personId,name", sort="personId asc"),
> >   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
> >   on="personId=ownerId"
> > )
> >
> >
> >
> > Queries on each collection give :
> >
> > $ curl --data-urlencode 'expr=search(books,
> >q="*:*",
> >fl="id, title_s, pubyear_i",
> >sort="pubyear_i asc",
> >qt="/export")'
> > http://localhost:8983/solr/books/stream
> >
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "title_s": "Friends",
> > "pubyear_i": 1994,
> > "id": "book2"
> >   },
> >   {
> > "title_s": "The Way of Kings",
> > "pubyear_i": 2010,
> > "id": "book1"
> >   },
> >   {
> > "EOF": true,
> > "RESPONSE_TIME": 16
> >   }
> > ]
> >   }
> > }
> >
> >
> > $ curl --data-urlencode 'expr=search(reviews,
> >q="author_s:d*",
> >fl="id, id_book_s, stars_i,
> review_dt",
> >sort="id_book_s asc",
> >qt="/export")'
> > http://localhost:8983/solr/reviews/stream
> >
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "stars_i": 3,
> > "id": "book1_c2",
> > "id_book_s": "book1",
> > "review_dt": "2014-03-15T12:00:00Z"
> >   },
> >   {
> > "stars_i": 4,
> > "id": "book1_c3",
> > "id_book_s": "book1",
> > 

Re: Upgrading cluster from 4 to 5. Slow replication detected.

2017-04-18 Thread Shawn Heisey
On 4/14/2017 2:10 AM, Himanshu Sachdeva wrote:
> We're starting to upgrade our solr cluster to version 5.5. So we
> removed one slave node from the cluster and installed solr 5.5.4 on it
> and started solr. So it started copying the index from the master.
> However, we noticed a drop in the replication speed compared to the
> other nodes which were still running solr 4. To do a fair comparison,
> I removed another slave node from the cluster and disabled replication
> on it till the new node has caught up with it. When both these nodes
> were at the same index generation, I turned replication on for both
> the nodes. Now, it has been over 15 hours since this exercise and the
> new node has again started lagging behind. Currently, the node with
> solr 5.5 is seven generations behind the other node.

Version 5 is capable of replication bandwidth throttling, but unless you
actually configure the maxWriteMBPerSec attribute in the replication
handler definition, this should not happen by default.

One problem that I think might be possible is that the heap has been
left at the default 512MB on the new 5.5.4 install and therefore the
machine is doing constant full garbage collections to free up memory for
normal operation, which would make Solr run EXTREMELY slowly. 
Eventually a machine in this state would most likely encounter an
OutOfMemoryError.  On non-windows systems, OOME will cause a forced halt
of the entire Solr instance.

The heap might not be the problem ... if it's not, then I do not know
what is going on.  Are there any errors or warnings in solr.log?

Thanks,
Shawn



Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-18 Thread Shawn Heisey
On 4/10/2017 1:57 AM, Himanshu Sachdeva wrote:
> Thanks for your time and quick response. As you said, I changed our
> logging level from SEVERE to INFO and indeed found the performance
> warning *Overlapping onDeckSearchers=2* in the logs. I am considering
> limiting the *maxWarmingSearchers* count in configuration but want to
> be sure that nothing breaks in production in case simultaneous commits
> do happen afterwards.

Don't do commits from multiple sources.  A good general practice with
Solr is to either use autoSoftCommit or add a commitWithin parameter to
each indexing request, so commits are fully automated and can't
overlap.  Make the interval on whichever method you use as large as you
can.  I would personally use 6 (one minute) as a bare minimum, and
would prefer a larger number.

A soft commit takes less time/resources than a hard commit that opens a
searcher, but they are NOT even close to "free".  Opening the searcher
(which all soft commits do) is the expensive part, not the commit itself.

Regardless of what else you do, you should have autoCommit configured
with openSearcher set to false.  I would personally use a maxTime of
6 (one minute) or 12 (two minutes) for autoCommit. 
Recommendations and example configs will commonly have this set to 15
seconds.  That value works well, and does not usually cause problems,
but I like to put less of a load on the server, so I use a larger interval.

See this blog post for a detailed discussion:

https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

> What would happen if we set *maxWarmingSearchers* count to 1 and make
> simultaneous commit from different endpoints? I understand that solr
> will prevent opening a new searcher for the second commit but is that
> all there is to it? Does it mean solr will serve stale data( i.e. send
> stale data to the slaves) ignoring the changes from the second commit?
> Will these changes reflect only when a new searcher is initialized and
> will they be ignored till then? Do we even need searchers on the
> master as we will be querying only the slaves? What purpose do the
> searchers serve exactly? Your time and guidance will be very much
> appreciated. Thank you. 

If the maxWarmingSearchers value prevents a commit from opening a
searcher, then changes between the previous commit and that commit will
not be visible *on the master* until a later commit happens and IS able
to open a new searcher.  What happens on the slaves may be a little bit
different, because commits normally only happen on the slave when a
changed index is replicated from the master.

The usual historical number for maxWarmingSearchers in example configs
on older versions is 2, while the intrinsic default is no limit
(Integer.MAX_VALUE).  Starting with 6.4.0, the intrinsic default has
been changed to 1, and the configuration has been removed from the
example configs.  Increasing it is almost always the wrong thing to do,
which is why the default has been lowered to 1.

https://issues.apache.org/jira/browse/SOLR-9712

https://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F

On the master, you should set up automatic commits as I described above
and do not make explicit commit requests from update clients.  On the
slaves, autoCommit should be set up just like the master, but the other
automatic settings aren't typically necessary.  On slaves, as already
mentioned, commits only happen when the index is replicated from the
master -- you generally don't need to worry about any special
commit-related configuration, aside from making sure that the
autowarmCount value on the caches is not too high.  Masters that do not
receive queries can have autowarmCount set to zero, which can improve
commit speed by making the searcher open faster.

To fix problems with exceeding the warming searcher limit, you must
reduce the commit frequency or make commits happen faster.

Side issue:  If you don't want the verbosity of INFO logging, which is
really noisy, set it to WARN.  A properly configured Solr server that is
not having problems should not log ANYTHING when the severity is WARN. 
If the configuration is not optimal, you may see some WARN messages. 
Setting the level to SEVERE is extremely restrictive, and will prevent
you from seeing informative error messages when problems happen.

Recent Solr versions do have a tendency to log information like this
repeatedly, followed by a stacktrace:

2017-04-14 19:40:00.207 WARN  (qtp895947612-598) [   x:spark2live]
o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_o0e]
java.nio.file.NoSuchFileException:
/index/solr6/data/data/spark2_0/index/segments_o0e

We have an issue filed for this message, but it hasn't yet been fixed. 
It does not seem to cause actual problems, just an annoying log
message.  Until the reason for this error is found and 

Re: Getting error while excuting full import

2017-04-18 Thread Mikhail Khludnev
Hello,

Shouldn't it just be where="PROPERTY_ID=PROPERTY_ID'"  since fields are
named the same in both tables.

On Tue, Apr 18, 2017 at 4:02 PM, ankur.168  wrote:

> Hi Mikhail,
>
> I tried with a simplest zipper entity. Here are the config details-
>
> 
>  query="SELECT
> PROPERTY_ID FROM property order by PROPERTY_ID">
>
> 
>
>  where="PROPERTY_ID=PROPERTY_ID'" query="SELECT * FROM
> property_text ORDER BY PROPERTY_ID"
> join="zipper">
> 
> 
> 
> 
>
> Here child entity have multiple records for a given property id. Hence I
> believe full import is failing. I have added new logs below. Is there a way
> Zipper supports multiple records merge?
>
> aused by: java.lang.IllegalArgumentException: expect strictly increasing
> primary keys for Relation PROPERTY_ID='${propertiesList.PROPERTY_ID}'
> got: ,
> at org.apache.solr.handler.dataimport.Zipper.onNewParent(
> Zipper.java:108)
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Getting-error-while-excuting-full-import-tp4329153p4330488.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Moving solr home

2017-04-18 Thread tedsolr
Looks like the issues are self inflicted. I have custom start/stop scripts
that actually specify the solr home directory as a param to the start
command (start -c -s ...). This was overriding my include variable. As for
the magical solr.xml file, that's also my doing because as part of shutdown
the script copies config files where they need to go.

When I wrote all this 2 years ago I needed the flexibility to run multiple
nodes on a single host (because requisitioning a server took an act of
congress). 


Susheel Kumar-3 wrote
> Hi,
> 
> Did you try to utilise the service installation scripts to deploy.  This
> makes it very easy for Prod deployments and allows to decouple data/index
> directory with Solr binaries. See below link
> 
> https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
> 
> Thanks,
> Susheel
> 
> On Mon, Apr 17, 2017 at 3:46 PM, tedsolr 

> tsmith@

>  wrote:
> 
>> I have a solr cloud cluster (v5.2.1 on redhat linux) that uses the
>> default
>> location for solr home: (install dir)/server/solr. I would like to move
>> the
>> index data somewhere else to make upgrades easier. When I set a SOLR_HOME
>> variable  solr appears to be ignoring it - and even creating a solr.xml
>> file
>> in (install dir)/server/solr. Is solr doing some auto detect instead of
>> using the defined SOLR_HOME variable?
>>
>> pre-reqs:
>> - copied all index data to new solr home
>> - placed a solr.xml file in new solr home
>> - set SOLR_HOME & SOLR_PID_DIR in (install dir)/bin/solr.in.sh
>> - deleted solr.xml from default solr home
>>
>> then restart solr nodes:
>> - the admin console still shows solr.solr.home as default
>> - solr.xml gets recreated in the default solr home folder
>>
>> Anyone know what I'm missing?
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/Moving-solr-home-tp4330350.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Moving-solr-home-tp4330350p4330489.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Innerjoin streaming expressions - Invalid JoinStream error

2017-04-18 Thread Joel Bernstein
 I looked through the test cases I don't think we have this covered exactly
as it's written.  Can you log a jira for this?

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Apr 18, 2017 at 6:33 AM, Dominique Bejean  wrote:

> Hi,
>
> I don not understand what I am doing wrong il this simple query.
>
> curl --data-urlencode 'expr=innerJoin(
> search(books,
>q="*:*",
>fl="id",
>sort="id asc"),
> searchreviews,
>q="*:*",
>fl="id_book_s",
>sort="id_book_s asc"),
> on="id=id_books_s"
> )' http://localhost:8983/solr/books/stream
>
> {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all incoming
> stream comparators (sort) must be a superset of this stream's
> equalitor.","EOF":true}]}}
>
>
> It is tottaly similar to the documentation example
> 
> innerJoin(
>   search(people, q=*:*, fl="personId,name", sort="personId asc"),
>   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
>   on="personId=ownerId"
> )
>
>
>
> Queries on each collection give :
>
> $ curl --data-urlencode 'expr=search(books,
>q="*:*",
>fl="id, title_s, pubyear_i",
>sort="pubyear_i asc",
>qt="/export")'
> http://localhost:8983/solr/books/stream
>
> {
>   "result-set": {
> "docs": [
>   {
> "title_s": "Friends",
> "pubyear_i": 1994,
> "id": "book2"
>   },
>   {
> "title_s": "The Way of Kings",
> "pubyear_i": 2010,
> "id": "book1"
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 16
>   }
> ]
>   }
> }
>
>
> $ curl --data-urlencode 'expr=search(reviews,
>q="author_s:d*",
>fl="id, id_book_s, stars_i, review_dt",
>sort="id_book_s asc",
>qt="/export")'
> http://localhost:8983/solr/reviews/stream
>
> {
>   "result-set": {
> "docs": [
>   {
> "stars_i": 3,
> "id": "book1_c2",
> "id_book_s": "book1",
> "review_dt": "2014-03-15T12:00:00Z"
>   },
>   {
> "stars_i": 4,
> "id": "book1_c3",
> "id_book_s": "book1",
> "review_dt": "2014-12-15T12:00:00Z"
>   },
>   {
> "stars_i": 3,
> "id": "book2_c2",
> "id_book_s": "book2",
> "review_dt": "1994-03-15T12:00:00Z"
>   },
>   {
> "stars_i": 4,
> "id": "book2_c3",
> "id_book_s": "book2",
> "review_dt": "1994-12-15T12:00:00Z"
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 47
>   }
> ]
>   }
> }
>
>
> Can someone help me to find my mistake ?
>
> Regards
>
> Dominique
> --
> Dominique Béjean
> 06 08 46 12 43
>


Re: Get handler not working

2017-04-18 Thread PeterCiuffetti
We've bumped into this issue too, but it was through the MoreLikeThis query
parser.  Internally it uses the get handler to obtain the seed document. 
One of our SOLR collections uses a shard router that is not the document id. 
The get handler will fail if the value of the document id is not the same as
the routing key.  And then this causes the CloudMLTQParser to issue the
message "Error completing MLT request. Could not fetch document with id
[/x/]"



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-handler-not-working-tp4325130p4330485.html
Sent from the Solr - User mailing list archive at Nabble.com.


Innerjoin streaming expressions - Invalid JoinStream error

2017-04-18 Thread Dominique Bejean
Hi,

I don not understand what I am doing wrong il this simple query.

curl --data-urlencode 'expr=innerJoin(
search(books,
   q="*:*",
   fl="id",
   sort="id asc"),
searchreviews,
   q="*:*",
   fl="id_book_s",
   sort="id_book_s asc"),
on="id=id_books_s"
)' http://localhost:8983/solr/books/stream

{"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all incoming
stream comparators (sort) must be a superset of this stream's
equalitor.","EOF":true}]}}


It is tottaly similar to the documentation example

innerJoin(
  search(people, q=*:*, fl="personId,name", sort="personId asc"),
  search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
  on="personId=ownerId"
)



Queries on each collection give :

$ curl --data-urlencode 'expr=search(books,
   q="*:*",
   fl="id, title_s, pubyear_i",
   sort="pubyear_i asc",
   qt="/export")'
http://localhost:8983/solr/books/stream

{
  "result-set": {
"docs": [
  {
"title_s": "Friends",
"pubyear_i": 1994,
"id": "book2"
  },
  {
"title_s": "The Way of Kings",
"pubyear_i": 2010,
"id": "book1"
  },
  {
"EOF": true,
"RESPONSE_TIME": 16
  }
]
  }
}


$ curl --data-urlencode 'expr=search(reviews,
   q="author_s:d*",
   fl="id, id_book_s, stars_i, review_dt",
   sort="id_book_s asc",
   qt="/export")'
http://localhost:8983/solr/reviews/stream

{
  "result-set": {
"docs": [
  {
"stars_i": 3,
"id": "book1_c2",
"id_book_s": "book1",
"review_dt": "2014-03-15T12:00:00Z"
  },
  {
"stars_i": 4,
"id": "book1_c3",
"id_book_s": "book1",
"review_dt": "2014-12-15T12:00:00Z"
  },
  {
"stars_i": 3,
"id": "book2_c2",
"id_book_s": "book2",
"review_dt": "1994-03-15T12:00:00Z"
  },
  {
"stars_i": 4,
"id": "book2_c3",
"id_book_s": "book2",
"review_dt": "1994-12-15T12:00:00Z"
  },
  {
"EOF": true,
"RESPONSE_TIME": 47
  }
]
  }
}


Can someone help me to find my mistake ?

Regards

Dominique
-- 
Dominique Béjean
06 08 46 12 43


Re: Solr Child="true" flag in version 6.4

2017-04-18 Thread Alexandre Rafalovitch
I am not sure I can explain it better than the link I gave. Basically
you select parent records and then use fl=*,[child] to add
children records into that.

I mostly wanted to make sure you knew about the flatten-by-default search.

Another way is to look for common _root_ field value. It will be same
for all records in the same block (parent, children, grandchildren,
etc). Of course, _root_ field is usually not stored, so it is a bit
hard to get to it. But you can look at all the indexed values that
field has in Admin UI's schema screen and then search by that value to
see how many records show up.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 18 April 2017 at 12:36, donjose  wrote:
> Hello Alex,
>
> Thanks for your reply. This is the first time i am doing with nested
> entities.
> Yes you are right am getting flat list combining parent & child.
> Could you please explain bit more detail, how to apply child transformer for
> the below mentioned response.
>
>
> Response
> ==
> {
>   "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "q":"*:*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1492506329597"}},
>   "response":{"numFound":3,"start":0,"docs":[
>   {
> "color":"Red",
> "id":"1"},
>   {
> "color":"Blue",
> "id":"2"},
>   {
> "category":["Shirt"],
> "id":"1",
> "_version_":1565006509454131200}]
>   }}
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Child-true-flag-in-version-6-4-tp4330312p4330459.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Index and query time suggester behavior in a SolrCloud environment

2017-04-18 Thread Andrea Gazzarini

Hi,
I have a project, with SolrCloud, where I'm going to use the Suggester 
component (BlendedInfixLookupFactory with DocumentDictionaryFactory).

Some info:

 * I will have a suggest-only collection, with no NRT requirements
   (indexes will be updated with a daily frequency)
 * I'm not yet sure about the replication factor (I have to do some checks)
 * I'm using Solrj on the client side

After reading some documentation I have a couple of doubts:

 * how the *suggest.build* command is working? Can I issue this command
   towards just one node, and have that node forward the request to the
   other nodes (so each of them can build its own suggester index portion)?
 * how things are working at query time? Can I use send a request with
   only suggest.q=... to my /suggest request handler and get back
   distributed suggestions?

Thanks in advance
Andrea


Re: Solr Child="true" flag in version 6.4

2017-04-18 Thread donjose
Hello Alex,

Thanks for your reply. This is the first time i am doing with nested
entities.
Yes you are right am getting flat list combining parent & child.
Could you please explain bit more detail, how to apply child transformer for
the below mentioned response.


Response
==
{
  "responseHeader":{
"status":0,
"QTime":0,
"params":{
  "q":"*:*",
  "indent":"on",
  "wt":"json",
  "_":"1492506329597"}},
  "response":{"numFound":3,"start":0,"docs":[
  {
"color":"Red",
"id":"1"},
  {
"color":"Blue",
"id":"2"},
  {
"category":["Shirt"],
"id":"1",
"_version_":1565006509454131200}]
  }}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Child-true-flag-in-version-6-4-tp4330312p4330459.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting error while excuting full import

2017-04-18 Thread ankur.168
Hi Mikhail,

Thanks for replying,

I am currently trying to use zipper join but getting null pointer exception
as given below stacktrace

2017-04-18 09:11:51.154 INFO  (qtp1348949648-13) [   x:sample_content]
o.a.s.u.p.LogUpdateProcessorFactory [sample_content]  webapp=/solr
path=/dataimport
params={debug=true=on=true=0=true=10=full-import=false=sample_content=false=dataimport=json&_=1492506703156}{deleteByQuery=*:*
(-1565006716610805760)} 0 615
2017-04-18 09:11:51.173 ERROR (qtp1348949648-13) [   x:sample_content]
o.a.s.h.d.DataImporter Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:475)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:180)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 34 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:247)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 36 more
Caused by: java.lang.NullPointerException
at

Re: AnalyzingInfixSuggester performance

2017-04-18 Thread Michael McCandless
AnalyzingInfixSuggester uses index-time sort, to sort all postings by the
suggest weight, so that lookup, as long as your sort by the suggest weight
is extremely fast.

But if you need to rank at lookup time by something not "congruent" with
the index-time sort then you lose that benefit.

Mike McCandless

http://blog.mikemccandless.com

On Sun, Apr 16, 2017 at 11:46 AM, OTH  wrote:

> Hello,
>
> From what I understand, the AnalyzingInfixSuggester is using a simple
> Lucene query; so I was wondering, how then would this suggester have better
> performance than using a simple Solr 'select' query on a regular Solr index
> (with an asterisk placed at the start and end of the query string).  I
> could understand why say an FST based suggester would be faster, but I
> wanted to confirm if that indeed is the case with AnalyzingInfixSuggester.
>
> One reason I ask is:
> I needed the results to be boosted based on the value of another field;
> e.g., if a user in the UK is searching for cities, then I'd need the cities
> which are in the UK to be boosted.  I was able to do this with a regular
> Solr index by adding something like these parameters:
> defType=edismax=country:UK^2.0
>
> However, I'm not sure if this is possible with the Suggester.  Moreover -
> other than the 'country' field above, there are other fields as well which
> I need to be returned with the results.  Since the Suggester seems to only
> allow one additional field, called 'payload', I'm able to do this by
> putting the values of all the other fields into a JSON and then placing
> that into the 'payload' field - however, I don't know if it would be
> possible then to incorporate the boosting mechanism I showed above.
>
> So I was thinking of just using a regular Solr index instead of the
> Suggester; I wanted to confirm, what if any is the performance improvement
> in using the AnalyzingInfixSuggester over using a regular index?
>
> Much thanks
>


Re: Solr Child="true" flag in version 6.4

2017-04-18 Thread Alexandre Rafalovitch
You say you are trying to use child=true, but the definition you gave
does not actually have one. Is it possible you tested with it once,
but then did not keep it for later tests accidentally?

Also, if that`s your first time working with nested entities, the
query returns parents and children all together, so it looks like a
flat list even when it is indexed correctly as nested. It may help to
have child document transformer at least during the initial debug:
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents#TransformingResultDocuments-[child]-ChildDocTransformerFactory

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 18 April 2017 at 12:09, donjose  wrote:
> Verbose debug output
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 109
>   },
>   "initArgs": [
> "defaults",
> [
>   "config",
>   "data-config.xml"
> ]
>   ],
>   "command": "full-import",
>   "mode": "debug",
>   "documents": [
> {
>   "category": [
> "Shirt"
>   ],
>   "id": [
> "1"
>   ],
>   "_version_": [
> 1565006509454131200
>   ],
>   "_root_": [
> "1"
>   ]
> }
>   ],
>   "verbose-output": [
> "entity:ASSETCATEGORY",
> [
>   "document#1",
>   [
> "query",
> "SELECT id, category from category",
> "time-taken",
> "0:0:0.34",
> null,
> "--- row #1-",
> "CATEGORY",
> "Shirt",
> "ID",
> "1",
> null,
> "-",
> "entity:Values",
> [
>   "query",
>   "SELECT id,catsize,color from categoryvalues where categoryid =
> 1",
>   "time-taken",
>   "0:0:0.22",
>   null,
>   "--- row #1-",
>   "CATSIZE",
>   "XL",
>   "COLOR",
>   "Red",
>   "ID",
>   "1",
>   null,
>   "-",
>   null,
>   "--- row #2-",
>   "CATSIZE",
>   "XL",
>   "COLOR",
>   "Blue",
>   "ID",
>   "2",
>   null,
>   "-"
> ]
>   ],
>   "document#2",
>   []
> ]
>   ],
>   "status": "idle",
>   "importResponse": "",
>   "statusMessages": {
> "Total Requests made to DataSource": "2",
> "Total Rows Fetched": "3",
> "Total Documents Processed": "1",
> "Total Documents Skipped": "0",
> "Full Dump Started": "2017-04-18 09:08:33",
> "": "Indexing completed. Added/Updated: 1 documents. Deleted 0
> documents.",
> "Committed": "2017-04-18 09:08:33",
> "Time taken": "0:0:0.94"
>   }
> }
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Child-true-flag-in-version-6-4-tp4330312p4330451.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Child="true" flag in version 6.4

2017-04-18 Thread donjose
Verbose debug output
{
  "responseHeader": {
"status": 0,
"QTime": 109
  },
  "initArgs": [
"defaults",
[
  "config",
  "data-config.xml"
]
  ],
  "command": "full-import",
  "mode": "debug",
  "documents": [
{
  "category": [
"Shirt"
  ],
  "id": [
"1"
  ],
  "_version_": [
1565006509454131200
  ],
  "_root_": [
"1"
  ]
}
  ],
  "verbose-output": [
"entity:ASSETCATEGORY",
[
  "document#1",
  [
"query",
"SELECT id, category from category",
"time-taken",
"0:0:0.34",
null,
"--- row #1-",
"CATEGORY",
"Shirt",
"ID",
"1",
null,
"-",
"entity:Values",
[
  "query",
  "SELECT id,catsize,color from categoryvalues where categoryid =
1",
  "time-taken",
  "0:0:0.22",
  null,
  "--- row #1-",
  "CATSIZE",
  "XL",
  "COLOR",
  "Red",
  "ID",
  "1",
  null,
  "-",
  null,
  "--- row #2-",
  "CATSIZE",
  "XL",
  "COLOR",
  "Blue",
  "ID",
  "2",
  null,
  "-"
]
  ],
  "document#2",
  []
]
  ],
  "status": "idle",
  "importResponse": "",
  "statusMessages": {
"Total Requests made to DataSource": "2",
"Total Rows Fetched": "3",
"Total Documents Processed": "1",
"Total Documents Skipped": "0",
"Full Dump Started": "2017-04-18 09:08:33",
"": "Indexing completed. Added/Updated: 1 documents. Deleted 0
documents.",
"Committed": "2017-04-18 09:08:33",
"Time taken": "0:0:0.94"
  }
}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Child-true-flag-in-version-6-4-tp4330312p4330451.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Filter Facet Query

2017-04-18 Thread Alexandre Rafalovitch
Are you saying that all the values in the facet are zero with that
query? The query you gave seems to be the super-basic faceting code,
so maybe something super-basic is missing.

E.g.
*) Did you check that the documents you get back actually have any
values in that field to facet on?
*) Did you try making a query just by ID for a document that
definitely has the value in that field?
*) Did you do the query with echoParams=all to see that you are not
having any hidden extra parameters that get appended?

Regards,
   Alex.


http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 18 April 2017 at 11:43, Furkan KAMACI  wrote:
> OK, it returns 0 results every time.
>
> So,
>
> I want to filter out research values with empty string ("") from facet
> result. How can I do that?
>
>
> On Tue, Apr 18, 2017 at 8:53 AM, Furkan KAMACI 
> wrote:
>
>> First problem is they do not match with main query.
>>
>> 18 Nis 2017 Sal, saat 01:54 tarihinde Dave 
>> şunu yazdı:
>>
>>> Min.count is what you're looking for to get non 0 facets
>>>
>>> > On Apr 17, 2017, at 6:51 PM, Furkan KAMACI 
>>> wrote:
>>> >
>>> > My query:
>>> >
>>> > /select?facet.field=research=on=content:test
>>> >
>>> > Q1) Facet returns research values with 0 counts which has a research
>>> value
>>> > that is not from a document matched by main query (content:test). Is
>>> that
>>> > usual?
>>> >
>>> > Q2) I want to filter out research values with empty string ("") from
>>> facet
>>> > result. How can I do that?
>>> >
>>> > Kind Regards,
>>> > Furkan KAMACI
>>>
>>


Re: Filter Facet Query

2017-04-18 Thread Furkan KAMACI
OK, it returns 0 results every time.

So,

I want to filter out research values with empty string ("") from facet
result. How can I do that?


On Tue, Apr 18, 2017 at 8:53 AM, Furkan KAMACI 
wrote:

> First problem is they do not match with main query.
>
> 18 Nis 2017 Sal, saat 01:54 tarihinde Dave 
> şunu yazdı:
>
>> Min.count is what you're looking for to get non 0 facets
>>
>> > On Apr 17, 2017, at 6:51 PM, Furkan KAMACI 
>> wrote:
>> >
>> > My query:
>> >
>> > /select?facet.field=research=on=content:test
>> >
>> > Q1) Facet returns research values with 0 counts which has a research
>> value
>> > that is not from a document matched by main query (content:test). Is
>> that
>> > usual?
>> >
>> > Q2) I want to filter out research values with empty string ("") from
>> facet
>> > result. How can I do that?
>> >
>> > Kind Regards,
>> > Furkan KAMACI
>>
>


Re: Solr Child="true" flag in version 6.4

2017-04-18 Thread Mikhail Khludnev
Hello,

This is puzzling. Are you sure you have recent DIH jar at that core?
Sometimes the old one can remain at lib directory.
One odd thing in the config is that category values are not limited with
something like WHERE categoryvalues.categoryid=${category.id}
Can you share verbose debug output?


18 апр. 2017 г. 8:48 пользователь "donjose" 
написал:

> I am getting flat structure.
> Expected Result :
> {
>  Category : "Shirt",
>  CategoryId : 1
>  Categoryvalues : [
>  {
>  id : 1,
>  size : XL,
> color : red
>  },
>  {
>  id : 2,
>  size : XL,
> color : blue
>  }
>
>  ]
> }
>
> Result i am getting:
>  Categoryvalues :
>  {
>  id : 1,
>  Category : "Shirt",
>  CategoryId : 1,
>  size : XL,
>  color : red
>  }
> ,
>  {
>  id : 2,
>  Category : "Shirt",
>  CategoryId : 1,
>  size : XL,
> color : blue
>  }
>
>
>
> Data-config.xml
> ===
> 
>  user="username" password="password" />
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble
> .com/Solr-Child-true-flag-in-version-6-4-tp4330312p4330437.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Getting error while excuting full import

2017-04-18 Thread ankur.168
Hi Erick,

Thanks for replying, As you suggest I can use solrJ to map RDBMS fetched
data and index/search it later on. but DIH gives multi db connection for
full import and other benefits.
Does solrJ supports this or we need to put efforts to make a multithreaded
connection pool similar to DIH?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-error-while-excuting-full-import-tp4329153p4330442.html
Sent from the Solr - User mailing list archive at Nabble.com.