Query regarding to multi tenant composite ID document routing

2017-10-31 Thread Ketan Thanki
Hi,

Need the help regarding to below mention query.

I have 2 collections with each has 4 shard and 4 replica and i want to  
implemented Composite document routing for my unique field 'Id'  as mentions 
below.
e.g : projectId:158380 modelId:3606 where tenants bits use as 
projectId/2!modelId/2 prefix with Id where Id is the unique solr documentID 
which is combination of value.

Query: How it will index in solr while use bit means it need to index with '/' 
or not as mention below
Like "158380/2! 3606/2!Id" OR "id":"79190!1803!Id"

And which value I need to pass in query for _route_ parameter.

Please do needful.
Ketan.
Please cast a vote for Asite in the 2017 Construction Computing Awards: Click 
here to Vote

[CC Award Winners!]



Re: Solr streaming questions

2017-10-31 Thread Joel Bernstein
It is not possible to use score with the /export handler. The /export
handler currently only supports sorting by fields.

You can sort by score using the default /select handler.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Oct 31, 2017 at 1:50 PM, Webster Homer 
wrote:

> I have a potential use case for solr searching via streaming expressions.
> I am currently using solr 6.2.0, but we will soon be upgrading to the 7.1.0
> version.
>
> I started testing out searching using streaming expressions.
> 1. If I use an alias instead of a collection name it fails. I see that
> there is a Jira, SOLR-7377. Is this fixed in 7.1.0?
>
> 2. If I try to sort the results by score, it gives me an undefined field
> error. So it seems that streaming searches must not return values ordered
> by relevancy?
> This is a stopper for us if it has not been addressed.
>
> This is my query:
> search(test-catalog-product-170724,defType="edismax",q="
> 7732-18-5",qf="searchmv_cas_number",mm="2<-12%",fl="id_record_spec,
> id_s, score",sort="score desc",qt="/export")
>
> This is the error:
> "EXCEPTION": "java.util.concurrent.ExecutionException:
> java.io.IOException:
> -->
> http://141.247.245.207:8983/solr/test-catalog-product-
> 170724_shard2_replica1/:org.apache.solr.common.SolrException:
> undefined field: \"score\"",
>
> I could not find a Jira for this issue. Is it not possible to retrieve the
> results ordered relevancy (score desc)?
>
> Seems kind of limiting
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>


SOLR-11504: Provide a config to restrict number of indexing threads

2017-10-31 Thread Nawab Zada Asad Iqbal
Hi,

I hit this issue https://issues.apache.org/jira/browse/SOLR-11504 while
migrating to solr6 and locally working around it in Lucene code. I am
thinking to fix it properly and hopefully patch back to Solr. Since, Lucene
code does not want to keep any such config, I am thinking to use a counting
semaphore in Solr code before calling IndexWriter.addDocument(s) or
IndexWriter.updateDocument(s).


IndexWriter.getDocument(s) and updateDocument(s) are being used in
DirectUpdateHandler2  and FileBasedSpellChecker.java. Since the normal
document indexing goes through DirectUpdateHandler2, I am thinking to only
throttle the number of indexing threads in this class. Does this make
sense?

Can anyone mentor me for this and review my change?


Thanks
Nawab


Re: Incomplete Index

2017-10-31 Thread Rick Leir
Dawg,
I have a similar setup, and this is what works for me. I have a field which 
contains a timestamp. The timestamp is set to be identical for all documents 
added/updated in a run. Whe the run is complete and some/many documents have 
been overwritten then I can delete all un-updated documents easily: they have a 
previous timestamp.
Cheers -- Rick


On October 31, 2017 7:54:22 AM EDT, "Emir Arnautović" 
 wrote:
>Hi,
>There is a possibility that you ended up with documents with the same
>ID and that you are overwriting docuements instead of writing new.
>
>In any case, I would suggest you change your approach in case you have
>enough disk space to keep two copies of indices:
>1. use alias to read data from index instead of index name
>2. index data into new index
>3. after verification (e.g. quick check would be number of docs) switch
>alias to new index
>4. keep old index available in case you need to switch back.
>5. before indexing next index, delete one from previous day to free up
>space.
>
>In case you have updates during day you have to account for that as
>well - stop updating while indexing new index; update both indices if
>want to be able to switch back at any point etc.
>
>HTH,
>Emir
>--
>Monitoring - Log Management - Alerting - Anomaly Detection
>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 31 Oct 2017, at 11:20, o1webdawg  wrote:
>> 
>> I have an index with about a million documents.  It is the backend
>for a
>> shopping cart system.
>> 
>> Sometimes the inventory gets out of sync with solr and the storefront
>> contains out of stock items.
>> 
>> So I setup a scheduled task on the server to run at 12am every
>morning to
>> delete the entire solr index.
>> 
>> Then at 12:04am I run another scheduled task to re-index the SQL
>database
>> containing the inventory.
>> 
>> Well, today I check it around 4am and only a fraction of the products
>are in
>> the solr index.
>> 
>> However, it did not seem to be idle and reloading it showed lots of
>deleted
>> documents.
>> 
>> 
>> I open up the core and the deletes keep going up, max docs goes up,
>but the
>> total docs stays the same.
>> 
>> It's really confusing me what is happening at this point and why I am
>> viewing these numbers of docs.
>> 
>> My theory is that the 12am delete is still running 5 hours later at
>the same
>> time as the re-indexing.
>> 
>> That's the only way I can explain this really odd behavior with my
>limited
>> knowledge.
>> 
>> Is my theory realistic and could the delete still be running?
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Graph Traversal

2017-10-31 Thread Kojo
Everything working fine, these functional programming is amazing.
Thank you!

2017-10-31 12:31 GMT-02:00 Kojo :

> Thank you, I am just starting with Streaming Expressions. I will try this
> one later.
>
> I will open another thread, because I can´t do some simple queries using
> Streaming Expressions.
>
>
>
>
> 2017-10-30 12:11 GMT-02:00 Pratik Patel :
>
>> You use this in query time. Since Streaming Expressions can be pipelined,
>> the next stage/function of pipeline will work on the new tuples generated.
>>
>> On Mon, Oct 30, 2017 at 10:09 AM, Kojo  wrote:
>>
>> > Do you store this new tuples, created by Streaming Expressions, in a new
>> > Solr cloud collection? Or just use this tuples in query time?
>> >
>> > 2017-10-30 11:00 GMT-02:00 Pratik Patel :
>> >
>> > > By including Cartesian function in Streaming Expression pipeline, you
>> can
>> > > convert a tuple having one multivalued field into multiple tuples
>> where
>> > > each tuple holds one value for the field which was originally
>> > multivalued.
>> > >
>> > > For example, if you have following document.
>> > >
>> > > { id: someID, fruits: [apple, organge, banana] }   // fruits is
>> > multivalued
>> > > > field
>> > >
>> > >
>> > > Applying Cartesian function would give following tuples.
>> > >
>> > > { id: someID , fruits: apple }, { id: someID, fruits: orange }, {id:
>> > > > someID, fruits: banana }
>> > >
>> > >
>> > > Now that fruits holds single values, you can also use any Streaming
>> > > Expression functions which don't work with multivalued fields. This
>> > happens
>> > > in the Streaming Expression pipeline so you don't have to flatten your
>> > > documents in index.
>> > >
>> > > On Mon, Oct 30, 2017 at 8:39 AM, Kojo  wrote:
>> > >
>> > > > Hi,
>> > > > just a question, I have no deep background on Solr, Graph etc.
>> > > > This solution looks like normalizing data like a m2m table in sql
>> > > database,
>> > > > is it?
>> > > >
>> > > >
>> > > >
>> > > > 2017-10-29 21:51 GMT-02:00 Pratik Patel :
>> > > >
>> > > > > For now, you can probably use Cartesian function of Streaming
>> > > Expressions
>> > > > > which Joel implemented to solve the same problem.
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/SOLR-10292
>> > > > > http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-
>> > > > > coming-in-solr-66.html
>> > > > >
>> > > > > Regards,
>> > > > > Pratik
>> > > > >
>> > > > > On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein <
>> joels...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > I don't see a jira ticket for this yet. Feel free to create it
>> and
>> > > > reply
>> > > > > > back with the link.
>> > > > > >
>> > > > > > Joel Bernstein
>> > > > > > http://joelsolr.blogspot.com/
>> > > > > >
>> > > > > > On Fri, Oct 27, 2017 at 9:55 AM, Kojo 
>> wrote:
>> > > > > >
>> > > > > > > Hi, I was looking for information on Graph Traversal. More
>> > > > > specifically,
>> > > > > > > support to search graph on multivalued field.
>> > > > > > >
>> > > > > > > Searching on the Internet, I found a question exactly the
>> same of
>> > > > mine,
>> > > > > > > with an answer that what I need is not implemented yet:
>> > > > > > > http://lucene.472066.n3.nabble.com/Using-multi-valued-
>> > > > > > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html
>> > > > > > >
>> > > > > > >
>> > > > > > > Is there a ticket on Jira to follow the implementation of
>> search
>> > > > graph
>> > > > > on
>> > > > > > > multivalued field?
>> > > > > > >
>> > > > > > > Thank you,
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


RE: Stateless queries to secured SOLR server.

2017-10-31 Thread Phil Scadden
Thanks Shawn. I have done it with SolrJ. Apart from needing the 
NoopResponseParser to handle the wt=, it was pretty painless.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Wednesday, 1 November 2017 2:43 a.m.
To: solr-user@lucene.apache.org
Subject: Re: Stateless queries to secured SOLR server.

On 10/29/2017 6:13 PM, Phil Scadden wrote:
> While SOLR is behind a firewall, I want to now move to a secured SOLR 
> environment. I had been hoping to keep SOLRJ out of the picture and just 
> using httpURLConnection. However, I also don't want to maintain session 
> state, preferring to send authentication with every request. Is this possible 
> with basic Authorization?

I do not know a lot about the authentication in Solr, but I do know that it's 
typically using HTTP basic authentication.  As I understand it, for this kind 
of authentication, every request will require the credentials.

I am not aware of any state/session capability where Solr's HTTP API is 
concerned.  As far as I know, the closest Solr comes to this is that certain 
things, particularly the Collections API, are async capable, where you start a 
process with one HTTP call and then you can make further requests to check 
whether it's done.

If your software is written in Java, I would strongly recommend SolrJ, rather 
than constructing the HTTP calls yourself.  The code is easier to write and 
understand.  For other languages, there are third-party Solr client libraries 
available.

Thanks,
Shawn

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Solr response with original value

2017-10-31 Thread Venkateswarlu Bommineni
Thanks for the reply Shawn.

But I am little confused on faceting on one field and return the result of
another field.
could you please give sample query. Thanks a lot in advance!!!

On Tue, Oct 31, 2017 at 8:23 PM, Shawn Heisey  wrote:

> On 10/31/2017 5:15 AM, Venkateswarlu Bommineni wrote:
> > Please suggest how to achieve below scenario.
> >
> > I have a field '*productSuggestion*' , below is the configuration.
> >
> >  > stored="true" multiValued="false" />
> >
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > when i use above field in facets i am getting only indexed values (lower
> > case values ) , but i need original value.
>
> One option is to change this field so it doesn't modify the text for the
> index.  If you changed "solr.TextField" to "solr.StrField", completely
> removed the analyzer sections, enabled docValues, and reindexed, that
> would be accomplished.
>
> But if you must have the lowercase filter on this field, then you're
> going to have to do what Emir suggested -- use the "copyField" tag in
> your schema to copy the field contents (which will always be the
> original value, never the analyzed value) to another field that uses a
> type with the "solr.StrField" class and has no analysis chain, and facet
> on that field instead.  I would recommend adding docValues to the new
> field, for facet performance.  To keep the index size down, you should
> turn off any of the other features on the new field that you don't need
> -- indexed, stored, etc.
>
> For any of these schema changes, you must completely reindex.
>
> Thanks,
> Shawn
>
>


Re: mvn test failing

2017-10-31 Thread Daniel Collins
I just ran the tests on master, and get the following failures, this may be
what Tarique means:


TestFoldingMultitermExtrasQuery.beforeTests:36->SolrTestCaseJ4.initCore:552->SolrTestCaseJ4.initCore:678->SolrTestCaseJ4.createCore:688
» Runtime

TestFoldingMultitermExtrasQuery.org.apache.solr.analysis.TestFoldingMultitermExtrasQuery
» ThreadLeak

TestICUCollationFieldOptions.beforeClass:33->SolrTestCaseJ4.initCore:552->SolrTestCaseJ4.initCore:678->SolrTestCaseJ4.createCore:688
» Runtime

TestICUCollationFieldOptions.org.apache.solr.schema.TestICUCollationFieldOptions
» ThreadLeak

TestICUCollationFieldDocValues.beforeClass:39->SolrTestCaseJ4.initCore:552->SolrTestCaseJ4.initCore:678->SolrTestCaseJ4.createCore:688
» Runtime

TestICUCollationFieldDocValues.org.apache.solr.schema.TestICUCollationFieldDocValues
» ThreadLeak

TestICUCollationField.beforeClass:42->SolrTestCaseJ4.initCore:552->SolrTestCaseJ4.initCore:678->SolrTestCaseJ4.createCore:688
» Runtime
  TestICUCollationField.org.apache.solr.schema.TestICUCollationField »
ThreadLeak

They work fine from ant strangely...  Will try to dig a bit.


On 31 October 2017 at 14:53, Daniel Collins  wrote:

> Another important question is which branch did you download?  I assume
> master as its the default, but remember that is a development branch, so it
> is entirely possible to have some test issues on that.
>
> On 31 October 2017 at 13:44, Shawn Heisey  wrote:
>
>> On 10/28/2017 11:48 PM, Tarique Anwer wrote:
>> > I am new to Solr.
>> > I am trying to build Solr from source code using Maven.
>> > So I performed the following steps:
>> >
>> > 1. Download the source code zip from https://github.com/apache/luce
>> ne-solr
>> > 2. unzip & run from top level dir:
>> >   $ ant get-maven-poms
>> > $ cd maven-build
>>
>> Maven is not the official build system.  It is included as an alternate
>> option, but doesn't get the same attention as the official system.
>>
>> The maven output gave you the location of more details about the tests
>> that failed.  Look there for more information.
>>
>> Or install/use ant, which is the official build system for Lucene and
>> Solr, and gives more information about test failures as part of the
>> build output.
>>
>> https://wiki.apache.org/solr/HowToContribute
>>
>> Sometimes Solr tests fail, even on released code.  Such failures are
>> investigated.  Sometimes the test itself is faulty, sometimes it's the
>> code that is being tested.  Sometimes the test framework randomizes a
>> combination of settings that doesn't actually work.
>>
>> Unless you need to have access to the source code for learning purposes,
>> or because you need to engage in development related to Solr, download
>> the binary release and don't worry about compiling it.
>>
>> Thanks,
>> Shawn
>>
>>
>


Solr streaming questions

2017-10-31 Thread Webster Homer
I have a potential use case for solr searching via streaming expressions.
I am currently using solr 6.2.0, but we will soon be upgrading to the 7.1.0
version.

I started testing out searching using streaming expressions.
1. If I use an alias instead of a collection name it fails. I see that
there is a Jira, SOLR-7377. Is this fixed in 7.1.0?

2. If I try to sort the results by score, it gives me an undefined field
error. So it seems that streaming searches must not return values ordered
by relevancy?
This is a stopper for us if it has not been addressed.

This is my query:
search(test-catalog-product-170724,defType="edismax",q="7732-18-5",qf="searchmv_cas_number",mm="2<-12%",fl="id_record_spec,
id_s, score",sort="score desc",qt="/export")

This is the error:
"EXCEPTION": "java.util.concurrent.ExecutionException: java.io.IOException:
-->
http://141.247.245.207:8983/solr/test-catalog-product-170724_shard2_replica1/:org.apache.solr.common.SolrException:
undefined field: \"score\"",

I could not find a Jira for this issue. Is it not possible to retrieve the
results ordered relevancy (score desc)?

Seems kind of limiting

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: max docs, deleted docs optimization

2017-10-31 Thread Erick Erickson
1> 2 lakh at most. If the standard background merging is going on it
may be less than that.

2> Some, but whether you notice or not is an open question. In an
index with only 10 lakh docs, it's unlikely even having 50% deleted
documents is going to make much of a difference.

3> Yes, the deleted docs are in segment until it's merged away. Lucene
is very efficient (according to Mike McCandless) at skipping deleted
docs.

4> It rewrites all segments, purging deleted documents. However, it
has some pitfalls, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
In general it's simply not recommended to optimize. There is a Solr
JIRA discussing this in detail, but I can't get to the site to link it
right now.

In general, as an index is updated segments are merged together and
during that process any deleted documents are purged.

Two resources:
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

See the third animation TieredMergePolicy which is the default here:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Best,
Erick

On Tue, Oct 31, 2017 at 4:40 AM, kshitij tyagi
 wrote:
> Hi,
>
> I am using atomic update to update one of the fields, I want to know :
>
> 1. if total docs in core are 10 lakh and I partially update 2 lakhs docs
> then what will be the number of deleted docs?
>
> 2. Does higher number of deleted docs have affect on query time? means does
> query time increases if deleted docs are more
>
> 3. Are deleted docs present in segment? during query execution does deleted
> docs are traversed.
>
> 4. What doe optimized button on solr admin does exactly.
>
> Help is much appreciated.
>
> Regards,
> Kshitij


Re: Solr response with original value

2017-10-31 Thread Shawn Heisey
On 10/31/2017 5:15 AM, Venkateswarlu Bommineni wrote:
> Please suggest how to achieve below scenario.
>
> I have a field '*productSuggestion*' , below is the configuration.
>
>  stored="true" multiValued="false" />
>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> when i use above field in facets i am getting only indexed values (lower
> case values ) , but i need original value.

One option is to change this field so it doesn't modify the text for the
index.  If you changed "solr.TextField" to "solr.StrField", completely
removed the analyzer sections, enabled docValues, and reindexed, that
would be accomplished.

But if you must have the lowercase filter on this field, then you're
going to have to do what Emir suggested -- use the "copyField" tag in
your schema to copy the field contents (which will always be the
original value, never the analyzed value) to another field that uses a
type with the "solr.StrField" class and has no analysis chain, and facet
on that field instead.  I would recommend adding docValues to the new
field, for facet performance.  To keep the index size down, you should
turn off any of the other features on the new field that you don't need
-- indexed, stored, etc.

For any of these schema changes, you must completely reindex.

Thanks,
Shawn



Re: mvn test failing

2017-10-31 Thread Daniel Collins
Another important question is which branch did you download?  I assume
master as its the default, but remember that is a development branch, so it
is entirely possible to have some test issues on that.

On 31 October 2017 at 13:44, Shawn Heisey  wrote:

> On 10/28/2017 11:48 PM, Tarique Anwer wrote:
> > I am new to Solr.
> > I am trying to build Solr from source code using Maven.
> > So I performed the following steps:
> >
> > 1. Download the source code zip from https://github.com/apache/
> lucene-solr
> > 2. unzip & run from top level dir:
> >   $ ant get-maven-poms
> > $ cd maven-build
>
> Maven is not the official build system.  It is included as an alternate
> option, but doesn't get the same attention as the official system.
>
> The maven output gave you the location of more details about the tests
> that failed.  Look there for more information.
>
> Or install/use ant, which is the official build system for Lucene and
> Solr, and gives more information about test failures as part of the
> build output.
>
> https://wiki.apache.org/solr/HowToContribute
>
> Sometimes Solr tests fail, even on released code.  Such failures are
> investigated.  Sometimes the test itself is faulty, sometimes it's the
> code that is being tested.  Sometimes the test framework randomizes a
> combination of settings that doesn't actually work.
>
> Unless you need to have access to the source code for learning purposes,
> or because you need to engage in development related to Solr, download
> the binary release and don't worry about compiling it.
>
> Thanks,
> Shawn
>
>


Re: Graph Traversal

2017-10-31 Thread Kojo
Thank you, I am just starting with Streaming Expressions. I will try this
one later.

I will open another thread, because I can´t do some simple queries using
Streaming Expressions.




2017-10-30 12:11 GMT-02:00 Pratik Patel :

> You use this in query time. Since Streaming Expressions can be pipelined,
> the next stage/function of pipeline will work on the new tuples generated.
>
> On Mon, Oct 30, 2017 at 10:09 AM, Kojo  wrote:
>
> > Do you store this new tuples, created by Streaming Expressions, in a new
> > Solr cloud collection? Or just use this tuples in query time?
> >
> > 2017-10-30 11:00 GMT-02:00 Pratik Patel :
> >
> > > By including Cartesian function in Streaming Expression pipeline, you
> can
> > > convert a tuple having one multivalued field into multiple tuples where
> > > each tuple holds one value for the field which was originally
> > multivalued.
> > >
> > > For example, if you have following document.
> > >
> > > { id: someID, fruits: [apple, organge, banana] }   // fruits is
> > multivalued
> > > > field
> > >
> > >
> > > Applying Cartesian function would give following tuples.
> > >
> > > { id: someID , fruits: apple }, { id: someID, fruits: orange }, {id:
> > > > someID, fruits: banana }
> > >
> > >
> > > Now that fruits holds single values, you can also use any Streaming
> > > Expression functions which don't work with multivalued fields. This
> > happens
> > > in the Streaming Expression pipeline so you don't have to flatten your
> > > documents in index.
> > >
> > > On Mon, Oct 30, 2017 at 8:39 AM, Kojo  wrote:
> > >
> > > > Hi,
> > > > just a question, I have no deep background on Solr, Graph etc.
> > > > This solution looks like normalizing data like a m2m table in sql
> > > database,
> > > > is it?
> > > >
> > > >
> > > >
> > > > 2017-10-29 21:51 GMT-02:00 Pratik Patel :
> > > >
> > > > > For now, you can probably use Cartesian function of Streaming
> > > Expressions
> > > > > which Joel implemented to solve the same problem.
> > > > >
> > > > > https://issues.apache.org/jira/browse/SOLR-10292
> > > > > http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-
> > > > > coming-in-solr-66.html
> > > > >
> > > > > Regards,
> > > > > Pratik
> > > > >
> > > > > On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein <
> joels...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I don't see a jira ticket for this yet. Feel free to create it
> and
> > > > reply
> > > > > > back with the link.
> > > > > >
> > > > > > Joel Bernstein
> > > > > > http://joelsolr.blogspot.com/
> > > > > >
> > > > > > On Fri, Oct 27, 2017 at 9:55 AM, Kojo 
> wrote:
> > > > > >
> > > > > > > Hi, I was looking for information on Graph Traversal. More
> > > > > specifically,
> > > > > > > support to search graph on multivalued field.
> > > > > > >
> > > > > > > Searching on the Internet, I found a question exactly the same
> of
> > > > mine,
> > > > > > > with an answer that what I need is not implemented yet:
> > > > > > > http://lucene.472066.n3.nabble.com/Using-multi-valued-
> > > > > > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html
> > > > > > >
> > > > > > >
> > > > > > > Is there a ticket on Jira to follow the implementation of
> search
> > > > graph
> > > > > on
> > > > > > > multivalued field?
> > > > > > >
> > > > > > > Thank you,
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


RE: LTR feature extraction performance issues

2017-10-31 Thread Brian Yee
Thank you Christine! I am still in the data gathering / model building phase 
and I have not yet re-ranked my results so that makes sense. It sounds like 
when I add re-ranking, the caching will start working. Thanks!

--Brian

-Original Message-
From: Christine Poerschke (BLOOMBERG/ LONDON) [mailto:cpoersc...@bloomberg.net] 
Sent: Tuesday, October 31, 2017 8:48 AM
To: solr-user@lucene.apache.org
Subject: RE: LTR feature extraction performance issues

Hi Brian,

I just tried to explore the scenario you describe with the techproducts example 
and am able to see what you see:

# step 1: start solr with techproducts example and ltr enabled # step 2: upload 
one feature (originalScore) and one model using that feature # step 3: examine 
cache stats via the Admin UI (all zero to start with) # step 4: run a query 
which includes feature extraction e.g. [features] in fl # step 5: examine cache 
stats to see lookups but no inserts # step 6: run a query with feature 
extraction _and_ re-ranking using the model # step 7: examine cache stats to 
see both lookups and inserts

Looking around the code the cache insert happens in FeatureLogger.java [1] 
which is called by the Rescorer [2] and this would allow the 'fl' feature 
logging to reuse the feature values calculated as part of the 'rq' re-ranking.

However, if there was no feature value in the cache (because no 'rq' re-ranking 
happened) then the feature value is calculated by 
LTRFeatureLoggerTransformerFactory.java [3] and based on code inspection the 
results of that calculation are not added to the cache.

It might be interesting to explore if/how that logic [3] could be changed.

--Christine

[1] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/FeatureLogger.java#L51-L60
[2] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java#L185-L205
[3] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java#L267-L280

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 10/30/17 16:55:14

I'm still having this issue. Does anyone have LTR feature extraction 
successfully running and have cache inserts/hits?

--Brian

-Original Message-
From: Brian Yee [mailto:b...@wayfair.com]
Sent: Tuesday, October 24, 2017 12:14 PM
To: solr-user@lucene.apache.org
Subject: RE: LTR feature extraction performance issues

Hi Alessandro,

Unfortunately some of my most important features are query dependent. I think I 
found an issue though. I don't think my features are being inserted into the 
cache. Notice "cumulative_inserts:0". There are a lot of lookups, but since 
there appear to be no values in the cache, the hitratio is 0.

stats:
cumulative_evictions:0
cumulative_hitratio:0
cumulative_hits:0
cumulative_inserts:0
cumulative_lookups:215319
evictions:0
hitratio:0
hits:0
inserts:0
lookups:3303
size:0
warmupTime:0


My configs look are as follows:



  

  
QUERY_DOC_FV
sparse
  

Would anyone have any idea why my features are not being inserted into the 
cache? Is there an additional config setting I need?


--Brian

-Original Message-
From: alessandro.benedetti [mailto:a.benede...@sease.io]
Sent: Monday, October 23, 2017 10:01 AM
To: solr-user@lucene.apache.org
Subject: Re: LTR feature extraction performance issues

It strictly depends on the kind of features you are using.
At the moment there is just one cache for all the features.
This means that even if you have 1 query dependent feature and 100 document 
dependent feature, a different value for the query dependent one will 
invalidate the cache entry for the full vector[1].

You may look to optimise your features ( where possible).

[1]  https://issues.apache.org/jira/browse/SOLR-10448



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: mvn test failing

2017-10-31 Thread Shawn Heisey
On 10/28/2017 11:48 PM, Tarique Anwer wrote:
> I am new to Solr.
> I am trying to build Solr from source code using Maven.
> So I performed the following steps:
>
> 1. Download the source code zip from https://github.com/apache/lucene-solr
> 2. unzip & run from top level dir:
>   $ ant get-maven-poms
> $ cd maven-build

Maven is not the official build system.  It is included as an alternate
option, but doesn't get the same attention as the official system.

The maven output gave you the location of more details about the tests
that failed.  Look there for more information.

Or install/use ant, which is the official build system for Lucene and
Solr, and gives more information about test failures as part of the
build output.

https://wiki.apache.org/solr/HowToContribute

Sometimes Solr tests fail, even on released code.  Such failures are
investigated.  Sometimes the test itself is faulty, sometimes it's the
code that is being tested.  Sometimes the test framework randomizes a
combination of settings that doesn't actually work.

Unless you need to have access to the source code for learning purposes,
or because you need to engage in development related to Solr, download
the binary release and don't worry about compiling it.

Thanks,
Shawn



Re: Stateless queries to secured SOLR server.

2017-10-31 Thread Shawn Heisey
On 10/29/2017 6:13 PM, Phil Scadden wrote:
> While SOLR is behind a firewall, I want to now move to a secured SOLR 
> environment. I had been hoping to keep SOLRJ out of the picture and just 
> using httpURLConnection. However, I also don't want to maintain session 
> state, preferring to send authentication with every request. Is this possible 
> with basic Authorization?

I do not know a lot about the authentication in Solr, but I do know that
it's typically using HTTP basic authentication.  As I understand it, for
this kind of authentication, every request will require the credentials.

I am not aware of any state/session capability where Solr's HTTP API is
concerned.  As far as I know, the closest Solr comes to this is that
certain things, particularly the Collections API, are async capable,
where you start a process with one HTTP call and then you can make
further requests to check whether it's done.

If your software is written in Java, I would strongly recommend SolrJ,
rather than constructing the HTTP calls yourself.  The code is easier to
write and understand.  For other languages, there are third-party Solr
client libraries available.

Thanks,
Shawn



Re: Failed to create collection SOLR 6.3 HDP 2.6.2

2017-10-31 Thread Shawn Heisey
On 10/30/2017 2:45 PM, Dan Caulfield wrote:
> Thanks Shawn,
> I tried your recommended solution and delete the maxis_clickstream
> directory.  I got the same error when trying to recreate the collection. 
> Can you think of anything else to try?

I am pretty sure that this problem happens because the data/index
directory for a core exists when Solr tries to create a core for the
collection that it has been asked to create.  The error messages I saw
in your original email indicated that the only file present in each of
the index directories at the time of the error was "write.lock".  I'm
pretty sure this could only happen if the index directory was already
there when Solr tried to create the index, but contained zero files
until Solr started the index creation.  Before you issue the CREATE call
via HTTP, are you doing any manual work on your HDFS filesystem?  If so,
you shouldn't do that.  When running in cloud mode, Solr will create all
of the files and directories it needs as part of the collection creation.

I think it's a bug that Lucene/Solr refuses to create a new index when
the index directory exists but contains no files.  This has been the
behavior for a VERY long time, and nobody has tried to fix it yet.  When
the index directory exists, *does* contain files, but not the critical
files that Lucene expects, then that error would be the right thing to do.

I found an existing issue for the problem with empty index directories
and submitted a patch for the issue, but I haven't yet heard whether I
accomplished the fix correctly:

https://issues.apache.org/jira/browse/SOLR-8628

Thanks,
Shawn



RE: LTR feature extraction performance issues

2017-10-31 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Brian,

I just tried to explore the scenario you describe with the techproducts example 
and am able to see what you see:

# step 1: start solr with techproducts example and ltr enabled
# step 2: upload one feature (originalScore) and one model using that feature
# step 3: examine cache stats via the Admin UI (all zero to start with)
# step 4: run a query which includes feature extraction e.g. [features] in fl
# step 5: examine cache stats to see lookups but no inserts
# step 6: run a query with feature extraction _and_ re-ranking using the model
# step 7: examine cache stats to see both lookups and inserts

Looking around the code the cache insert happens in FeatureLogger.java [1] 
which is called by the Rescorer [2] and this would allow the 'fl' feature 
logging to reuse the feature values calculated as part of the 'rq' re-ranking.

However, if there was no feature value in the cache (because no 'rq' re-ranking 
happened) then the feature value is calculated by 
LTRFeatureLoggerTransformerFactory.java [3] and based on code inspection the 
results of that calculation are not added to the cache.

It might be interesting to explore if/how that logic [3] could be changed.

--Christine

[1] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/FeatureLogger.java#L51-L60
[2] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java#L185-L205
[3] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java#L267-L280

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 10/30/17 16:55:14

I'm still having this issue. Does anyone have LTR feature extraction 
successfully running and have cache inserts/hits?

--Brian

-Original Message-
From: Brian Yee [mailto:b...@wayfair.com] 
Sent: Tuesday, October 24, 2017 12:14 PM
To: solr-user@lucene.apache.org
Subject: RE: LTR feature extraction performance issues

Hi Alessandro,

Unfortunately some of my most important features are query dependent. I think I 
found an issue though. I don't think my features are being inserted into the 
cache. Notice "cumulative_inserts:0". There are a lot of lookups, but since 
there appear to be no values in the cache, the hitratio is 0.

stats:
cumulative_evictions:0
cumulative_hitratio:0
cumulative_hits:0
cumulative_inserts:0
cumulative_lookups:215319
evictions:0
hitratio:0
hits:0
inserts:0
lookups:3303
size:0
warmupTime:0


My configs look are as follows:



  

  
QUERY_DOC_FV
sparse
  

Would anyone have any idea why my features are not being inserted into the 
cache? Is there an additional config setting I need?


--Brian

-Original Message-
From: alessandro.benedetti [mailto:a.benede...@sease.io] 
Sent: Monday, October 23, 2017 10:01 AM
To: solr-user@lucene.apache.org
Subject: Re: LTR feature extraction performance issues

It strictly depends on the kind of features you are using.
At the moment there is just one cache for all the features.
This means that even if you have 1 query dependent feature and 100 document 
dependent feature, a different value for the query dependent one will 
invalidate the cache entry for the full vector[1].

You may look to optimise your features ( where possible).

[1]  https://issues.apache.org/jira/browse/SOLR-10448



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Sum area polygon solr

2017-10-31 Thread Samur Araujo
Hi all, is it possible to sum the area of a polygon in solr?

Suppose I do an polygon intersect and I want to retrieve the total area of
the resulting polygon.

Is it possible?

Best,

-- 
Head of Data
Geophy
www.geophy.com

Nieuwe Plantage 54-55
2611XK  Delft
+31 (0)70 7640725

1 Fore Street
EC2Y 9DT  London
+44 (0)20 37690760


Re: Solr response with original value

2017-10-31 Thread Venkateswarlu Bommineni
Thanks for the reply.

Can u please tell me how to use copy field to return original values in the
query.

On 31 Oct 2017 5:29 pm, "Emir Arnautović" 
wrote:

> Hi Venkat,
> If you need this field for searching, then you need to use copyfield to
> copy this to some other filed of type string that will be used for faceting:
>
> 
>  stored="false" />
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 31 Oct 2017, at 12:15, Venkateswarlu Bommineni 
> wrote:
> >
> > Hello Team,
> >
> > Please suggest how to achieve below scenario.
> >
> > I have a field '*productSuggestion*' , below is the configuration.
> >
> >  > stored="true" multiValued="false" />
> >
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > when i use above field in facets i am getting only indexed values (lower
> > case values ) , but i need original value.
> >
> > Example : Original String : Life is Beautiful
> > when i search for 'life' as facet prefix it is giving me the result 'life
> > is beautiful'
> >
> > Could you please suggest a way to return original value instead of
> indexed
> > value ?
> >
> > Thanks,
> > Venkat.
>
>


Re: Solr response with original value

2017-10-31 Thread Emir Arnautović
Hi Venkat,
If you need this field for searching, then you need to use copyfield to copy 
this to some other filed of type string that will be used for faceting:




HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 31 Oct 2017, at 12:15, Venkateswarlu Bommineni  wrote:
> 
> Hello Team,
> 
> Please suggest how to achieve below scenario.
> 
> I have a field '*productSuggestion*' , below is the configuration.
> 
>  stored="true" multiValued="false" />
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> when i use above field in facets i am getting only indexed values (lower
> case values ) , but i need original value.
> 
> Example : Original String : Life is Beautiful
> when i search for 'life' as facet prefix it is giving me the result 'life
> is beautiful'
> 
> Could you please suggest a way to return original value instead of indexed
> value ?
> 
> Thanks,
> Venkat.



Re: Incomplete Index

2017-10-31 Thread Emir Arnautović
Hi,
There is a possibility that you ended up with documents with the same ID and 
that you are overwriting docuements instead of writing new.

In any case, I would suggest you change your approach in case you have enough 
disk space to keep two copies of indices:
1. use alias to read data from index instead of index name
2. index data into new index
3. after verification (e.g. quick check would be number of docs) switch alias 
to new index
4. keep old index available in case you need to switch back.
5. before indexing next index, delete one from previous day to free up space.

In case you have updates during day you have to account for that as well - stop 
updating while indexing new index; update both indices if want to be able to 
switch back at any point etc.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 31 Oct 2017, at 11:20, o1webdawg  wrote:
> 
> I have an index with about a million documents.  It is the backend for a
> shopping cart system.
> 
> Sometimes the inventory gets out of sync with solr and the storefront
> contains out of stock items.
> 
> So I setup a scheduled task on the server to run at 12am every morning to
> delete the entire solr index.
> 
> Then at 12:04am I run another scheduled task to re-index the SQL database
> containing the inventory.
> 
> Well, today I check it around 4am and only a fraction of the products are in
> the solr index.
> 
> However, it did not seem to be idle and reloading it showed lots of deleted
> documents.
> 
> 
> I open up the core and the deletes keep going up, max docs goes up, but the
> total docs stays the same.
> 
> It's really confusing me what is happening at this point and why I am
> viewing these numbers of docs.
> 
> My theory is that the 12am delete is still running 5 hours later at the same
> time as the re-indexing.
> 
> That's the only way I can explain this really odd behavior with my limited
> knowledge.
> 
> Is my theory realistic and could the delete still be running?
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



max docs, deleted docs optimization

2017-10-31 Thread kshitij tyagi
Hi,

I am using atomic update to update one of the fields, I want to know :

1. if total docs in core are 10 lakh and I partially update 2 lakhs docs
then what will be the number of deleted docs?

2. Does higher number of deleted docs have affect on query time? means does
query time increases if deleted docs are more

3. Are deleted docs present in segment? during query execution does deleted
docs are traversed.

4. What doe optimized button on solr admin does exactly.

Help is much appreciated.

Regards,
Kshitij


Automatic creation of indexes

2017-10-31 Thread Jokin Cuadrado
Hi, I'm using solr to store time series data, log events etc. Right now I
use a solr cloud collection and cleaning it deleting documents via queries,
but I would like to know what approaches are other people using.
Is there a way to  create a collection when receiving a post to a
inexistent inded? So i could use the date as part of the index name, and
the cleanup process would be just to delete the old collections.


Solr response with original value

2017-10-31 Thread Venkateswarlu Bommineni
Hello Team,

Please suggest how to achieve below scenario.

I have a field '*productSuggestion*' , below is the configuration.














when i use above field in facets i am getting only indexed values (lower
case values ) , but i need original value.

Example : Original String : Life is Beautiful
when i search for 'life' as facet prefix it is giving me the result 'life
is beautiful'

Could you please suggest a way to return original value instead of indexed
value ?

Thanks,
Venkat.


Incomplete Index

2017-10-31 Thread o1webdawg
I have an index with about a million documents.  It is the backend for a
shopping cart system.

Sometimes the inventory gets out of sync with solr and the storefront
contains out of stock items.

So I setup a scheduled task on the server to run at 12am every morning to
delete the entire solr index.

Then at 12:04am I run another scheduled task to re-index the SQL database
containing the inventory.

Well, today I check it around 4am and only a fraction of the products are in
the solr index.

However, it did not seem to be idle and reloading it showed lots of deleted
documents.


I open up the core and the deletes keep going up, max docs goes up, but the
total docs stays the same.

It's really confusing me what is happening at this point and why I am
viewing these numbers of docs.

My theory is that the 12am delete is still running 5 hours later at the same
time as the re-indexing.

That's the only way I can explain this really odd behavior with my limited
knowledge.

Is my theory realistic and could the delete still be running?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html