Re: Adding the same field value question

2015-12-28 Thread Jamie Johnson
Yes the field is multi valued
On Dec 28, 2015 3:48 PM, "Jack Krupansky"  wrote:

> Is the field multivalued?
>
> -- Jack Krupansky
>
> On Sun, Dec 27, 2015 at 11:16 PM, Jamie Johnson  wrote:
>
> > What is the difference of adding a field with the same value twice or
> > adding it once and boosting the field on add?  Is there a situation where
> > one approach is preferred?
> >
> > Jamie
> >
>


Re: how's multi-query scoring?

2015-12-28 Thread Binoy Dalal
You can do this by specifying slop for your fields.
If you want to see how exactly your query is being treated you should use
the analysis tool that is available in the solr admin ui under your
collection name.

On Mon, 28 Dec 2015, 12:24 Jason  wrote:

> Hi, all
> I'm wondering how multi generated queries are scoring.
>
> My schema setting is
>
>  positionIncrementGap="100">
>   
>  
>   words="lang/stopwords_en.txt" enablePositionIncrements="true" />
>   generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
>  
>   
> 
>
>
> If I query test:chloro-4-hydroxy with default opearator AND,
> result is below
>
> 
>   0
>   300
>   
> true
> *,score
> true
> test:chloro-4-hydroxy
> AND
> xml
>   
> 
> 
>   
> test1
> chloro-4-hydroxy meaningless word
> 1.8082676
>   
> test2
> chloro 4 meaningless word hydroxy
> 1.8082676
> 
>
>
> Socre of two document is same.
> If possible, I want to more score at doc id 'test1'.
> 'chloro', '4', 'hydroxy' are close each other in 'test1'.
> But 'hydroxy' is far from 'chloro' and '4' in 'test2'.
> I think 'test1' is more proper than 'test2'.
> Is there a way to give more score according to distance among the query
> terms?
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-s-multi-query-scoring-tp4247512.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: how's multi-query scoring?

2015-12-28 Thread Binoy Dalal
Yes you use the pf options under edismax.
Have you indexed the field with term frequencies and position data?
Because slop basically works with phrase queries for which you need the
term frequencies and positions available.

On Mon, 28 Dec 2015, 15:02 Jason  wrote:

> I know the analysis tool under solr admin ui.
>
> Specifying slop for query fields is the using edismax, right?
>
> I have quried using edismax like below.
>
>
> http://localhost:8080/solr/collection1/select?q=test:(chloro-4-hydroxy)=*,score=true=AND=test=test~1
> ^2.0=edismax
>
> But score of two docs is still same.
>
> How can i do like yours?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-s-multi-query-scoring-tp4247512p4247517.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: Stable Versions in Solr 4

2015-12-28 Thread Binoy Dalal
You should take a look at solr's jira.
That'll give you a pretty good idea of the various feature upgrades across
versions as well as the bugs present in the various versions.

On Mon, 28 Dec 2015, 17:42 abhi Abhishek  wrote:

> Hi All,
>i am trying to determine stable version of SOLR 4. is there a blog which
> we can refer.. i understand we can read through Release Notes. I am
> interested in user reviews and challenges seen with various versions of
> SOLR 4.
>
>
> Appreciate your contribution.
>
> Thanks,
> Abhishek
>
-- 
Regards,
Binoy Dalal


Re: how's multi-query scoring?

2015-12-28 Thread Jason
I know the analysis tool under solr admin ui.

Specifying slop for query fields is the using edismax, right?

I have quried using edismax like below.

http://localhost:8080/solr/collection1/select?q=test:(chloro-4-hydroxy)=*,score=true=AND=test=test~1^2.0=edismax

But score of two docs is still same.

How can i do like yours?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-s-multi-query-scoring-tp4247512p4247517.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ( no servers hosting shard ) very strange

2015-12-28 Thread Binoy Dalal
Has changing the heap size fixed your GC problem?

As for the leader not being elected, I am not very sure about it but if
there was some issue then you'd see it in the solr logs as exceptions so
you should check that.

On Mon, 28 Dec 2015, 15:21 elvis鱼人  wrote:

> yes, i saw massive full GC ,so i change java -Xms10g -Xmx10g
> and there is another problem,
> shard1:
>   192.168.100.210:7001-leader
>   192.168.100.211:7001-replica
>
> shard2:
>   192.168.100.211:7002:leader
>   192.168.100.212:7001:replica
>
> shard3:
>   192.168.100.210:7002:leader
>   192.168.100.212:7002:replica
> i shutdown 2 server,210:7001 and 210:7002, then i saw replica could't
> electing leader.i don't know why?
> and this appearance doesn't appear every time
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/no-servers-hosting-shard-very-strange-tp4247349p4247518.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: ( no servers hosting shard ) very strange

2015-12-28 Thread elvis鱼人
yes, i saw massive full GC ,so i change java -Xms10g -Xmx10g
and there is another problem,
shard1:
  192.168.100.210:7001-leader
  192.168.100.211:7001-replica

shard2:
  192.168.100.211:7002:leader
  192.168.100.212:7001:replica

shard3:
  192.168.100.210:7002:leader
  192.168.100.212:7002:replica
i shutdown 2 server,210:7001 and 210:7002, then i saw replica could't
electing leader.i don't know why?
and this appearance doesn't appear every time



--
View this message in context: 
http://lucene.472066.n3.nabble.com/no-servers-hosting-shard-very-strange-tp4247349p4247518.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrMeter is still a feasible tool for measuring performances?

2015-12-28 Thread Gian Maria Ricci - aka Alkampfer
Hi, 

 

I've read on SolrWiki that solrmeter is not active developed anymore, but I
wonder if it is still valid to do some performance test or if there is some
better approach / tool.

 

I'd like also to know where I can find the latest compiled version for
SolrMeter instead of compiling with maven. The release page on GitHub only
gives the source code
https://github.com/tflobbe/solrmeter/releases/tag/solrmeter-parent-0.3.0 

 

Thanks in advance for any help you can give me.

--
Gian Maria Ricci
Cell: +39 320 0136949

 

   


 



Re: SolrMeter is still a feasible tool for measuring performances?

2015-12-28 Thread Binoy Dalal
Hi Gian
We've using solr meter to test the performance of solr instances for quite
a while now and in my experience it is pretty reliable.
Finding a compiled jar is difficult but building from the code is pretty
straightforward and will only take you a few minutes.

On Mon, 28 Dec 2015, 13:47 Gian Maria Ricci - aka Alkampfer <
alkamp...@nablasoft.com> wrote:

> Hi,
>
>
>
> I’ve read on SolrWiki that solrmeter is not active developed anymore, but
> I wonder if it is still valid to do some performance test or if there is
> some better approach / tool.
>
>
>
> I’d like also to know where I can find the latest compiled version for
> SolrMeter instead of compiling with maven. The release page on GitHub only
> gives the source code
> https://github.com/tflobbe/solrmeter/releases/tag/solrmeter-parent-0.3.0
>
>
>
> Thanks in advance for any help you can give me.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
> [image:
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
>  [image:
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
>  [image:
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
>  [image:
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
>  [image:
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>
>
>
-- 
Regards,
Binoy Dalal


Stable Versions in Solr 4

2015-12-28 Thread abhi Abhishek
Hi All,
   i am trying to determine stable version of SOLR 4. is there a blog which
we can refer.. i understand we can read through Release Notes. I am
interested in user reviews and challenges seen with various versions of
SOLR 4.


Appreciate your contribution.

Thanks,
Abhishek


Re: Stable Versions in Solr 4

2015-12-28 Thread Rajani Maski
Solr 4.10.3

On Mon, Dec 28, 2015 at 5:51 PM, Binoy Dalal  wrote:

> You should take a look at solr's jira.
> That'll give you a pretty good idea of the various feature upgrades across
> versions as well as the bugs present in the various versions.
>
> On Mon, 28 Dec 2015, 17:42 abhi Abhishek  wrote:
>
> > Hi All,
> >i am trying to determine stable version of SOLR 4. is there a blog
> which
> > we can refer.. i understand we can read through Release Notes. I am
> > interested in user reviews and challenges seen with various versions of
> > SOLR 4.
> >
> >
> > Appreciate your contribution.
> >
> > Thanks,
> > Abhishek
> >
> --
> Regards,
> Binoy Dalal
>


Re: Adding the same field value question

2015-12-28 Thread Jamie Johnson
Thanks, I wasn't sure if adding twice and boosting results in a similar
thing happening under the hood or not.  Appreciate the response.

Jamie
On Dec 28, 2015 9:08 AM, "Binoy Dalal"  wrote:

> There's no benefit in adding the same field twice because that'll just
> increase the size of your index without providing any real benefits at
> query time.
> For increasing the scores, boosting is definitely the way to go.
>
> On Mon, 28 Dec 2015, 09:46 Jamie Johnson  wrote:
>
> > What is the difference of adding a field with the same value twice or
> > adding it once and boosting the field on add?  Is there a situation where
> > one approach is preferred?
> >
> > Jamie
> >
> --
> Regards,
> Binoy Dalal
>


Solr - facet fields that contain other facet fields

2015-12-28 Thread Kevin Lopez
*What I am trying to accomplish: *
Generate a facet based on the documents uploaded and a text file containing
terms from a domain/ontology such that a facet is shown if a term is in the
text file and in a document (key phrase extraction).

*The problem:*
When I select the facet for the term "*not necessarily*" (we see there is a
space) and I get the results for the term "*not*". The field is tokenized
and multivalued. This leads me to believe that I can not use a tokenized
field as a facet field. I tried to copy the values of the field to a text
field with a keywordtokenizer. I am told when checking the schema browser:
"Sorry, no Term Info available :(" This is after I delete the old index and
upload the documents again. The facet is coming from a field that is
already copied from another field, so I cannot copy this field to a text
field with a keywordtokenizer or strfield. What can I do to fix this? Is
there an alternate way to accomplish this?

*Here is my configuration:*










  



  
  


  



Regards,

Kevin


Re: Adding the same field value question

2015-12-28 Thread Binoy Dalal
It will only be the same in a very few number of very well choreographed
cases or complete coincidences.
If you're interested in how this might happen you should take a look at how
lucene matches and scores docs based on your query.
https://lucene.apache.org/core/3_5_0/api/core/org/apache/lucene/search/Similarity.html

On Mon, 28 Dec 2015, 20:01 Jamie Johnson  wrote:

> Thanks, I wasn't sure if adding twice and boosting results in a similar
> thing happening under the hood or not.  Appreciate the response.
>
> Jamie
> On Dec 28, 2015 9:08 AM, "Binoy Dalal"  wrote:
>
> > There's no benefit in adding the same field twice because that'll just
> > increase the size of your index without providing any real benefits at
> > query time.
> > For increasing the scores, boosting is definitely the way to go.
> >
> > On Mon, 28 Dec 2015, 09:46 Jamie Johnson  wrote:
> >
> > > What is the difference of adding a field with the same value twice or
> > > adding it once and boosting the field on add?  Is there a situation
> where
> > > one approach is preferred?
> > >
> > > Jamie
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal


Re: Solr - facet fields that contain other facet fields

2015-12-28 Thread Jamie Johnson
Can you do the opposite?  Index into an unanalyzed field and copy into the
analyzed?

If I remember correctly facets are based off of indexed values so if you
tokenize the field then the facets will be as you are seeing now.
On Dec 28, 2015 9:45 AM, "Kevin Lopez"  wrote:

> *What I am trying to accomplish: *
> Generate a facet based on the documents uploaded and a text file containing
> terms from a domain/ontology such that a facet is shown if a term is in the
> text file and in a document (key phrase extraction).
>
> *The problem:*
> When I select the facet for the term "*not necessarily*" (we see there is a
> space) and I get the results for the term "*not*". The field is tokenized
> and multivalued. This leads me to believe that I can not use a tokenized
> field as a facet field. I tried to copy the values of the field to a text
> field with a keywordtokenizer. I am told when checking the schema browser:
> "Sorry, no Term Info available :(" This is after I delete the old index and
> upload the documents again. The facet is coming from a field that is
> already copied from another field, so I cannot copy this field to a text
> field with a keywordtokenizer or strfield. What can I do to fix this? Is
> there an alternate way to accomplish this?
>
> *Here is my configuration:*
>
> 
>
>  multiValued="true" type="Cytokine_Pass"/>
> 
> 
> 
> 
> 
>
>stored="true" multiValued="true"
>termPositions="true"
>termVectors="true"
>termOffsets="true"/>
>  sortMissingLast="true" omitNorms="true">
> 
>  minShingleSize="2" maxShingleSize="5"
> outputUnigramsIfNoShingles="true"
> />
>   
>   
>  synonyms="synonyms_ColonCancer.txt" ignoreCase="true" expand="true"
> tokenizerFactory="solr.KeywordTokenizerFactory"/>
>  words="prefLabels_ColonCancer.txt" ignoreCase="true"/>
>   
> 
> 
>
> Regards,
>
> Kevin
>


Re: Solr5.X document loss in splitting shards

2015-12-28 Thread GW
I don't use Curl but there are a couple of things that come to mind

1: Maybe use document routing with the shards. Use an "!" in your unique
ID. I'm using gmail to read this and it sucks for searching content so if
you have done this please ignore this point. Example: If you were storing
documents per domain you unique field values would look like
www.domain1.com!123,  www.domain1.com!124,
   www.domain2.com!35, etc.

This should create a two segment hash for searching shards. I do this in
blind faith as a best practice as it is mentioned in the docs.

2: Curl works best with URL encoding. I was using Curl at one time and I
noticed some strange results w/o url encoding

What are you using to write your client?

Best,

GW



On 27 December 2015 at 19:35, Shawn Heisey  wrote:

> On 12/26/2015 11:21 AM, Luca Quarello wrote:
> > I have a SOLR 5.3.1 CLOUD with two nodes and 8 shards per node.
> >
> > Each shard is about* 35 million documents (**35025882**) and 16GB sized.*
> >
> >
> >- I launch the SPLIT command on a shard (shard 13) in the ASYNC way:
>
> 
>
> > The new created shards have:
> > *13430316 documents (5.6 GB) and 13425924 documents (5.59 GB**)*.
>
> Where are you looking that shows you the source shard has 35 million
> documents?  Be extremely specific.
>
> The following screenshot shows one place you might be looking for this
> information -- the core overview page:
>
> https://www.dropbox.com/s/311n49wkp9kw7xa/admin-ui-core-overview.png?dl=0
>
> Is the core overview page where you are looking, or is it somewhere else?
>
> I'm asking because "Max Doc" and "Num Docs" on the core overview page
> mean very different things.  The difference between them is the number
> of deleted docs, and the split shards are probably missing those deleted
> docs.
>
> This is the only idea that I have.  If it's not that, then I'm as
> clueless as you are.
>
> Thanks,
> Shawn
>
>


Re: Solr 6 - Relational Index querying

2015-12-28 Thread Joel Bernstein
I'll add one important caveat:

At this time the /export handler does not support returning scores. In
order to join result sets you would typically need to be working with the
entire result sets from both sides of the join, which may be too slow
without the /export handler. But if you're working with smaller result sets
it will be possible to use the default /select handler which will return
scores.

Adding scores to the /export handler does need to get on the roadmap. The
initial release of the Streaming API was really designed for OLAP type
queries which typically don't involve scoring.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Dec 28, 2015 at 8:49 AM, Dennis Gove  wrote:

> There have been a lot of new features added to the Streaming API and the
> documentation hasn't kept pace, but it is something I'd like to have filled
> in by the release of Solr 6.
>
> With the Streaming API you can take two (or more) totally disconnected
> collections and get a result set with documents from one, both, or all of
> them. To be clear, when I say they can be totally disconnected I mean
> exactly that - the collections do not need to share any infrastructure or
> even know about each other in anyway. They can exist across any number of
> data centers, use completely different Zookeeper clusters, etc... No shared
> infrastructure is necessary. Updates/Inserts/Deletes to one of the
> collections has zero impact on the other collections.
>
> In your example, with Items and FacilityItems, I'd most likely construct a
> join like this (note, I'm using Streaming Expresssions but the same would
> be possible in SQL).
>
> innerJoin(
>   search(items, fl="itemId,itemDescription", q="*:*", sort="itemId asc"),
>   search(facilityItems, fl="itemId,facilityName,cost", q="*:*",
> sort="itemId asc"),
>   on="itemId"
> )
>
> This will return documents with the fields itemId, itemDescription,
> facilityName, and cost. Because it's an innerJoin only documents with parts
> found in both collections will be returned but if you want you can do a
> leftOuterJoin as well to get items which may not have facilityItems
> documents.
>
> Regarding the use of boosting - I'll assume that's because you're returning
> results in score order. I can't remember the syntax to use in the
> search(...) clause to tell it to search by score but for the sake of
> discussion let's assume that sort="score desc" would do that (ie, highest
> score first). This poses a problem on the innerJoin because as it is a
> merge based join it does expect the two incoming streams to be sorted by
> the same fields but with a score sort that isn't possible. However, we can
> instead use a hash based join to get around this limitation.
>
> hashJoin(
>   search(items, fl="itemId,itemDescription", q="itemDescription:bear",
> sort="score desc"),
>   hashed = search(facilityItems, fl="itemId,facilityName,cost", q="*:*",
> sort="itemId asc"),
>   on="itemId"
> )
>
> Note that in this I've changed the first search clause by adding a q clause
> to find all where the description includes "bear" and to sort by the score.
> I've also marked the second search clause as the on that should be hashed.
> The stream that is marked to be hashed will be read in full and all
> documents stored in memory - for this reason you'll almost always want to
> hash the one with the fewest documents in it but do be aware that the order
> of the results will depend on the order of the non-hashed stream. For this
> reason I've hashed the one whose order we don't necessarily care about and
> am preserving the ordering by score.
>
> This will return the exact same documents but the order will now be by the
> score of the match found in the search over the items collections.
>
> - Dennis
>
> On Wed, Dec 23, 2015 at 10:43 PM, Troy Edwards 
> wrote:
>
> > In Solr 5.1.0 we had to flatten out two collections into one
> >
> > Item - about 1.5 million items with primary key - ItemId (this mainly
> > contains item description)
> >
> > FacilityItem - about 10,000 facilities - primary key - FacilityItemId
> > (pricing information for each facility) - ItemId points to Item
> >
> > We are currently using this index for only about 200 facilities. We are
> > using edismax parser to query and boost results
> >
> > I am hoping that in Solr 6 with Parallel SQL or stream innerJoin we can
> use
> > two collections so that it will be helpful in doing updates.
> >
> > But so far I have not seen something that will exactly fit what we need.
> >
> > Any thoughts/suggestions on what documentation to read or any samples on
> > how to approach what we are trying to achieve?
> >
> > Thanks
> >
>


Re: Solr 6 - Relational Index querying

2015-12-28 Thread Dennis Gove
There have been a lot of new features added to the Streaming API and the
documentation hasn't kept pace, but it is something I'd like to have filled
in by the release of Solr 6.

With the Streaming API you can take two (or more) totally disconnected
collections and get a result set with documents from one, both, or all of
them. To be clear, when I say they can be totally disconnected I mean
exactly that - the collections do not need to share any infrastructure or
even know about each other in anyway. They can exist across any number of
data centers, use completely different Zookeeper clusters, etc... No shared
infrastructure is necessary. Updates/Inserts/Deletes to one of the
collections has zero impact on the other collections.

In your example, with Items and FacilityItems, I'd most likely construct a
join like this (note, I'm using Streaming Expresssions but the same would
be possible in SQL).

innerJoin(
  search(items, fl="itemId,itemDescription", q="*:*", sort="itemId asc"),
  search(facilityItems, fl="itemId,facilityName,cost", q="*:*",
sort="itemId asc"),
  on="itemId"
)

This will return documents with the fields itemId, itemDescription,
facilityName, and cost. Because it's an innerJoin only documents with parts
found in both collections will be returned but if you want you can do a
leftOuterJoin as well to get items which may not have facilityItems
documents.

Regarding the use of boosting - I'll assume that's because you're returning
results in score order. I can't remember the syntax to use in the
search(...) clause to tell it to search by score but for the sake of
discussion let's assume that sort="score desc" would do that (ie, highest
score first). This poses a problem on the innerJoin because as it is a
merge based join it does expect the two incoming streams to be sorted by
the same fields but with a score sort that isn't possible. However, we can
instead use a hash based join to get around this limitation.

hashJoin(
  search(items, fl="itemId,itemDescription", q="itemDescription:bear",
sort="score desc"),
  hashed = search(facilityItems, fl="itemId,facilityName,cost", q="*:*",
sort="itemId asc"),
  on="itemId"
)

Note that in this I've changed the first search clause by adding a q clause
to find all where the description includes "bear" and to sort by the score.
I've also marked the second search clause as the on that should be hashed.
The stream that is marked to be hashed will be read in full and all
documents stored in memory - for this reason you'll almost always want to
hash the one with the fewest documents in it but do be aware that the order
of the results will depend on the order of the non-hashed stream. For this
reason I've hashed the one whose order we don't necessarily care about and
am preserving the ordering by score.

This will return the exact same documents but the order will now be by the
score of the match found in the search over the items collections.

- Dennis

On Wed, Dec 23, 2015 at 10:43 PM, Troy Edwards 
wrote:

> In Solr 5.1.0 we had to flatten out two collections into one
>
> Item - about 1.5 million items with primary key - ItemId (this mainly
> contains item description)
>
> FacilityItem - about 10,000 facilities - primary key - FacilityItemId
> (pricing information for each facility) - ItemId points to Item
>
> We are currently using this index for only about 200 facilities. We are
> using edismax parser to query and boost results
>
> I am hoping that in Solr 6 with Parallel SQL or stream innerJoin we can use
> two collections so that it will be helpful in doing updates.
>
> But so far I have not seen something that will exactly fit what we need.
>
> Any thoughts/suggestions on what documentation to read or any samples on
> how to approach what we are trying to achieve?
>
> Thanks
>


Re: ( no servers hosting shard ) very strange

2015-12-28 Thread elvis鱼人
yes ,just may be fixed 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/no-servers-hosting-shard-very-strange-tp4247349p4247525.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding the same field value question

2015-12-28 Thread Binoy Dalal
There's no benefit in adding the same field twice because that'll just
increase the size of your index without providing any real benefits at
query time.
For increasing the scores, boosting is definitely the way to go.

On Mon, 28 Dec 2015, 09:46 Jamie Johnson  wrote:

> What is the difference of adding a field with the same value twice or
> adding it once and boosting the field on add?  Is there a situation where
> one approach is preferred?
>
> Jamie
>
-- 
Regards,
Binoy Dalal


Re: Solr 6 - Relational Index querying

2015-12-28 Thread Dennis Gove
Correct me if I'm wrong but I believe one can use the /export and /select
handlers interchangeably within a single streaming expression. This could
allow you to use the /select handler in the search(...) clause where a
score is necessary and the /export handler in the search(...) clauses where
it is not. Assuming the query in the clause with the score is limiting the
resultset to a reasonable size this might be able to get you around the
performance problems in using the /select handler in potentially other
large streams which we are joining with.

On Mon, Dec 28, 2015 at 9:11 AM, Joel Bernstein  wrote:

> I'll add one important caveat:
>
> At this time the /export handler does not support returning scores. In
> order to join result sets you would typically need to be working with the
> entire result sets from both sides of the join, which may be too slow
> without the /export handler. But if you're working with smaller result sets
> it will be possible to use the default /select handler which will return
> scores.
>
> Adding scores to the /export handler does need to get on the roadmap. The
> initial release of the Streaming API was really designed for OLAP type
> queries which typically don't involve scoring.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Dec 28, 2015 at 8:49 AM, Dennis Gove  wrote:
>
> > There have been a lot of new features added to the Streaming API and the
> > documentation hasn't kept pace, but it is something I'd like to have
> filled
> > in by the release of Solr 6.
> >
> > With the Streaming API you can take two (or more) totally disconnected
> > collections and get a result set with documents from one, both, or all of
> > them. To be clear, when I say they can be totally disconnected I mean
> > exactly that - the collections do not need to share any infrastructure or
> > even know about each other in anyway. They can exist across any number of
> > data centers, use completely different Zookeeper clusters, etc... No
> shared
> > infrastructure is necessary. Updates/Inserts/Deletes to one of the
> > collections has zero impact on the other collections.
> >
> > In your example, with Items and FacilityItems, I'd most likely construct
> a
> > join like this (note, I'm using Streaming Expresssions but the same would
> > be possible in SQL).
> >
> > innerJoin(
> >   search(items, fl="itemId,itemDescription", q="*:*", sort="itemId asc"),
> >   search(facilityItems, fl="itemId,facilityName,cost", q="*:*",
> > sort="itemId asc"),
> >   on="itemId"
> > )
> >
> > This will return documents with the fields itemId, itemDescription,
> > facilityName, and cost. Because it's an innerJoin only documents with
> parts
> > found in both collections will be returned but if you want you can do a
> > leftOuterJoin as well to get items which may not have facilityItems
> > documents.
> >
> > Regarding the use of boosting - I'll assume that's because you're
> returning
> > results in score order. I can't remember the syntax to use in the
> > search(...) clause to tell it to search by score but for the sake of
> > discussion let's assume that sort="score desc" would do that (ie, highest
> > score first). This poses a problem on the innerJoin because as it is a
> > merge based join it does expect the two incoming streams to be sorted by
> > the same fields but with a score sort that isn't possible. However, we
> can
> > instead use a hash based join to get around this limitation.
> >
> > hashJoin(
> >   search(items, fl="itemId,itemDescription", q="itemDescription:bear",
> > sort="score desc"),
> >   hashed = search(facilityItems, fl="itemId,facilityName,cost", q="*:*",
> > sort="itemId asc"),
> >   on="itemId"
> > )
> >
> > Note that in this I've changed the first search clause by adding a q
> clause
> > to find all where the description includes "bear" and to sort by the
> score.
> > I've also marked the second search clause as the on that should be
> hashed.
> > The stream that is marked to be hashed will be read in full and all
> > documents stored in memory - for this reason you'll almost always want to
> > hash the one with the fewest documents in it but do be aware that the
> order
> > of the results will depend on the order of the non-hashed stream. For
> this
> > reason I've hashed the one whose order we don't necessarily care about
> and
> > am preserving the ordering by score.
> >
> > This will return the exact same documents but the order will now be by
> the
> > score of the match found in the search over the items collections.
> >
> > - Dennis
> >
> > On Wed, Dec 23, 2015 at 10:43 PM, Troy Edwards  >
> > wrote:
> >
> > > In Solr 5.1.0 we had to flatten out two collections into one
> > >
> > > Item - about 1.5 million items with primary key - ItemId (this mainly
> > > contains item description)
> > >
> > > FacilityItem - about 10,000 facilities - primary key - FacilityItemId
> > > (pricing information for each 

Re: Solr - facet fields that contain other facet fields

2015-12-28 Thread Kevin Lopez
I am not sure I am following correctly. The field I upload the document to
would be "content" the analyzed field is "ColonCancerField". The "content"
field contains the entire text of the document, in my case a pubmed
abstract. This is a tokenized field. I made this field untokenized and I
still received the same results [the results for not instead of not
necessarily (in my current example I have 2 docs with not and 1 doc with
not necessarily {not is of course in the document that contains not
necessarily})]:

http://imgur.com/a/1bfXT

I also tried this:

http://localhost:8983/solr/Cytokine/select?=ColonCancerField
:"not+necessarily"

I still receive the two documents, which is the same as doing
ColonCancerField:"not"

Just to clarify the structure looks like this: *content (untokenized,
unanalyzed)* [copied to]==> *ColonCancerField *(tokenized, analyzed) then I
browse the ColonCancerField and the facets state that there is 1 document
for not necessarily, but when selecting it, solr returns 2 results.

-Kevin

On Mon, Dec 28, 2015 at 10:22 AM, Jamie Johnson  wrote:

> Can you do the opposite?  Index into an unanalyzed field and copy into the
> analyzed?
>
> If I remember correctly facets are based off of indexed values so if you
> tokenize the field then the facets will be as you are seeing now.
> On Dec 28, 2015 9:45 AM, "Kevin Lopez"  wrote:
>
> > *What I am trying to accomplish: *
> > Generate a facet based on the documents uploaded and a text file
> containing
> > terms from a domain/ontology such that a facet is shown if a term is in
> the
> > text file and in a document (key phrase extraction).
> >
> > *The problem:*
> > When I select the facet for the term "*not necessarily*" (we see there
> is a
> > space) and I get the results for the term "*not*". The field is tokenized
> > and multivalued. This leads me to believe that I can not use a tokenized
> > field as a facet field. I tried to copy the values of the field to a text
> > field with a keywordtokenizer. I am told when checking the schema
> browser:
> > "Sorry, no Term Info available :(" This is after I delete the old index
> and
> > upload the documents again. The facet is coming from a field that is
> > already copied from another field, so I cannot copy this field to a text
> > field with a keywordtokenizer or strfield. What can I do to fix this? Is
> > there an alternate way to accomplish this?
> >
> > *Here is my configuration:*
> >
> > 
> >
> >  > multiValued="true" type="Cytokine_Pass"/>
> > 
> > 
> > 
> > 
> > 
> >
> >> stored="true" multiValued="true"
> >termPositions="true"
> >termVectors="true"
> >termOffsets="true"/>
> >  > sortMissingLast="true" omitNorms="true">
> > 
> >  > minShingleSize="2" maxShingleSize="5"
> > outputUnigramsIfNoShingles="true"
> > />
> >   
> >   
> >  > synonyms="synonyms_ColonCancer.txt" ignoreCase="true" expand="true"
> > tokenizerFactory="solr.KeywordTokenizerFactory"/>
> >  > words="prefLabels_ColonCancer.txt" ignoreCase="true"/>
> >   
> > 
> > 
> >
> > Regards,
> >
> > Kevin
> >
>


Re: Solr - facet fields that contain other facet fields

2015-12-28 Thread Binoy Dalal
1) When faceting use field of type string. That'll rid you of your
tokenization problems.
Alternatively do not use any tokenizers.
Also turn doc values on for the field. It'll improve performance.
2) If however you do need to use a tokenized field for faceting, make sure
that they're pretty short in terms of number of tokens or else your app
will die real soon.

On Mon, 28 Dec 2015, 22:24 Kevin Lopez  wrote:

> I am not sure I am following correctly. The field I upload the document to
> would be "content" the analyzed field is "ColonCancerField". The "content"
> field contains the entire text of the document, in my case a pubmed
> abstract. This is a tokenized field. I made this field untokenized and I
> still received the same results [the results for not instead of not
> necessarily (in my current example I have 2 docs with not and 1 doc with
> not necessarily {not is of course in the document that contains not
> necessarily})]:
>
> http://imgur.com/a/1bfXT
>
> I also tried this:
>
> http://localhost:8983/solr/Cytokine/select?=ColonCancerField
> :"not+necessarily"
>
> I still receive the two documents, which is the same as doing
> ColonCancerField:"not"
>
> Just to clarify the structure looks like this: *content (untokenized,
> unanalyzed)* [copied to]==> *ColonCancerField *(tokenized, analyzed) then I
> browse the ColonCancerField and the facets state that there is 1 document
> for not necessarily, but when selecting it, solr returns 2 results.
>
> -Kevin
>
> On Mon, Dec 28, 2015 at 10:22 AM, Jamie Johnson  wrote:
>
> > Can you do the opposite?  Index into an unanalyzed field and copy into
> the
> > analyzed?
> >
> > If I remember correctly facets are based off of indexed values so if you
> > tokenize the field then the facets will be as you are seeing now.
> > On Dec 28, 2015 9:45 AM, "Kevin Lopez"  wrote:
> >
> > > *What I am trying to accomplish: *
> > > Generate a facet based on the documents uploaded and a text file
> > containing
> > > terms from a domain/ontology such that a facet is shown if a term is in
> > the
> > > text file and in a document (key phrase extraction).
> > >
> > > *The problem:*
> > > When I select the facet for the term "*not necessarily*" (we see there
> > is a
> > > space) and I get the results for the term "*not*". The field is
> tokenized
> > > and multivalued. This leads me to believe that I can not use a
> tokenized
> > > field as a facet field. I tried to copy the values of the field to a
> text
> > > field with a keywordtokenizer. I am told when checking the schema
> > browser:
> > > "Sorry, no Term Info available :(" This is after I delete the old index
> > and
> > > upload the documents again. The facet is coming from a field that is
> > > already copied from another field, so I cannot copy this field to a
> text
> > > field with a keywordtokenizer or strfield. What can I do to fix this?
> Is
> > > there an alternate way to accomplish this?
> > >
> > > *Here is my configuration:*
> > >
> > > 
> > >
> > >  > > multiValued="true" type="Cytokine_Pass"/>
> > > 
> > > 
> > > 
> > > 
> > > 
> > >
> > >> > stored="true" multiValued="true"
> > >termPositions="true"
> > >termVectors="true"
> > >termOffsets="true"/>
> > >  > > sortMissingLast="true" omitNorms="true">
> > > 
> > >  > > minShingleSize="2" maxShingleSize="5"
> > > outputUnigramsIfNoShingles="true"
> > > />
> > >   
> > >   
> > >  > > synonyms="synonyms_ColonCancer.txt" ignoreCase="true" expand="true"
> > > tokenizerFactory="solr.KeywordTokenizerFactory"/>
> > >  > > words="prefLabels_ColonCancer.txt" ignoreCase="true"/>
> > >   
> > > 
> > > 
> > >
> > > Regards,
> > >
> > > Kevin
> > >
> >
>
-- 
Regards,
Binoy Dalal


Re: Solr - facet fields that contain other facet fields

2015-12-28 Thread Erick Erickson
bq:  so I cannot copy this field to a text field with a
keywordtokenizer or strfield

1> There is no restriction on whether a field is analyzed or not as far as
faceting is concerned. You can freely facet on an analyzed field
or String field or KeywordTokenized field. As Binoy says, though,
faceting on large analyzed text fields is dangerous.

2> copyField directives are not chained. As soon as the
field is received, before _anything_ is done the raw contents are
pushed to the copyField destinations. So in your case the source
for both copyField directives should be "content". Otherwise you
get into interesting behavior if you, say,  copyField from A to B and
have another copyField from B to A. I _suspect_ this is
why you have no term info available, but check

3> This is not going to work as you're trying to implement it. If you
tokenize, the only available terms are "not" and "necessarily". There
is no "not necessarily" _token_ to facet on. If you use a String
or KeywordAnalylzed field, likewise there is no "not necessarily"
token, there will be a _single_ token that's the entire content of the field
(I'm leaving aside, for instance, WordDelimiterFilterFactory
modifications...).

One way to approach this would be to recognize and index synthetic
tokens representing the concepts. You'd pre-analyze the text, do your
entity recognition and add those entities to a special "entity" field or
some such. This would be an unanalyzed field that you facet on. Let's
say your entity was "colon cancer". Whenever you recognized that in
the text during indexing, you'd index "colon_cancer", or "disease_234"
in your special field.

Of course your app would then have to present this pleasingly, and
rather than the app needing access to your dictionary the "colon_cancer"
form would be easier to unpack.

The fragility here is that changing your text file of entities would require
you to re-index to re-inject them into documents.

You could also, assuming you know all the entities that should match
a given query form facet _queries_ on the phrases. This could get to be
quite a large query, but has the advantage of not requiring re-indexing.
So you'd have something like
facet.query=field:"not necessarily"=field:certainly
etc.

Best,
Erick


On Mon, Dec 28, 2015 at 9:13 AM, Binoy Dalal  wrote:
> 1) When faceting use field of type string. That'll rid you of your
> tokenization problems.
> Alternatively do not use any tokenizers.
> Also turn doc values on for the field. It'll improve performance.
> 2) If however you do need to use a tokenized field for faceting, make sure
> that they're pretty short in terms of number of tokens or else your app
> will die real soon.
>
> On Mon, 28 Dec 2015, 22:24 Kevin Lopez  wrote:
>
>> I am not sure I am following correctly. The field I upload the document to
>> would be "content" the analyzed field is "ColonCancerField". The "content"
>> field contains the entire text of the document, in my case a pubmed
>> abstract. This is a tokenized field. I made this field untokenized and I
>> still received the same results [the results for not instead of not
>> necessarily (in my current example I have 2 docs with not and 1 doc with
>> not necessarily {not is of course in the document that contains not
>> necessarily})]:
>>
>> http://imgur.com/a/1bfXT
>>
>> I also tried this:
>>
>> http://localhost:8983/solr/Cytokine/select?=ColonCancerField
>> :"not+necessarily"
>>
>> I still receive the two documents, which is the same as doing
>> ColonCancerField:"not"
>>
>> Just to clarify the structure looks like this: *content (untokenized,
>> unanalyzed)* [copied to]==> *ColonCancerField *(tokenized, analyzed) then I
>> browse the ColonCancerField and the facets state that there is 1 document
>> for not necessarily, but when selecting it, solr returns 2 results.
>>
>> -Kevin
>>
>> On Mon, Dec 28, 2015 at 10:22 AM, Jamie Johnson  wrote:
>>
>> > Can you do the opposite?  Index into an unanalyzed field and copy into
>> the
>> > analyzed?
>> >
>> > If I remember correctly facets are based off of indexed values so if you
>> > tokenize the field then the facets will be as you are seeing now.
>> > On Dec 28, 2015 9:45 AM, "Kevin Lopez"  wrote:
>> >
>> > > *What I am trying to accomplish: *
>> > > Generate a facet based on the documents uploaded and a text file
>> > containing
>> > > terms from a domain/ontology such that a facet is shown if a term is in
>> > the
>> > > text file and in a document (key phrase extraction).
>> > >
>> > > *The problem:*
>> > > When I select the facet for the term "*not necessarily*" (we see there
>> > is a
>> > > space) and I get the results for the term "*not*". The field is
>> tokenized
>> > > and multivalued. This leads me to believe that I can not use a
>> tokenized
>> > > field as a facet field. I tried to copy the values of the field to a
>> text
>> > > field with a keywordtokenizer. I 

Re: SolrMeter is still a feasible tool for measuring performances?

2015-12-28 Thread Erick Erickson
SolrMeter has some pretty cool features, one of which is to extract
queries from existing Solr logs. If the Solr logging patterns have
changed, which they do, that may require some fixing up...

Let us know...

Erick

On Mon, Dec 28, 2015 at 12:25 AM, Binoy Dalal  wrote:
> Hi Gian
> We've using solr meter to test the performance of solr instances for quite
> a while now and in my experience it is pretty reliable.
> Finding a compiled jar is difficult but building from the code is pretty
> straightforward and will only take you a few minutes.
>
> On Mon, 28 Dec 2015, 13:47 Gian Maria Ricci - aka Alkampfer <
> alkamp...@nablasoft.com> wrote:
>
>> Hi,
>>
>>
>>
>> I’ve read on SolrWiki that solrmeter is not active developed anymore, but
>> I wonder if it is still valid to do some performance test or if there is
>> some better approach / tool.
>>
>>
>>
>> I’d like also to know where I can find the latest compiled version for
>> SolrMeter instead of compiling with maven. The release page on GitHub only
>> gives the source code
>> https://github.com/tflobbe/solrmeter/releases/tag/solrmeter-parent-0.3.0
>>
>>
>>
>> Thanks in advance for any help you can give me.
>>
>> --
>> Gian Maria Ricci
>> Cell: +39 320 0136949
>>
>> [image:
>> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
>>  [image:
>> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
>>  [image:
>> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
>>  [image:
>> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
>>  [image:
>> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>>
>>
>>
> --
> Regards,
> Binoy Dalal


Re: SolrMeter is still a feasible tool for measuring performances?

2015-12-28 Thread Binoy Dalal
Solr meter works very well with solr 4.10.4 including the query extraction
feature.
We've been using it for quite a while now.
You should give it a try. Won't take very long to setup and use.

On Mon, 28 Dec 2015, 23:23 Erick Erickson  wrote:

> SolrMeter has some pretty cool features, one of which is to extract
> queries from existing Solr logs. If the Solr logging patterns have
> changed, which they do, that may require some fixing up...
>
> Let us know...
>
> Erick
>
> On Mon, Dec 28, 2015 at 12:25 AM, Binoy Dalal 
> wrote:
> > Hi Gian
> > We've using solr meter to test the performance of solr instances for
> quite
> > a while now and in my experience it is pretty reliable.
> > Finding a compiled jar is difficult but building from the code is pretty
> > straightforward and will only take you a few minutes.
> >
> > On Mon, 28 Dec 2015, 13:47 Gian Maria Ricci - aka Alkampfer <
> > alkamp...@nablasoft.com> wrote:
> >
> >> Hi,
> >>
> >>
> >>
> >> I’ve read on SolrWiki that solrmeter is not active developed anymore,
> but
> >> I wonder if it is still valid to do some performance test or if there is
> >> some better approach / tool.
> >>
> >>
> >>
> >> I’d like also to know where I can find the latest compiled version for
> >> SolrMeter instead of compiling with maven. The release page on GitHub
> only
> >> gives the source code
> >>
> https://github.com/tflobbe/solrmeter/releases/tag/solrmeter-parent-0.3.0
> >>
> >>
> >>
> >> Thanks in advance for any help you can give me.
> >>
> >> --
> >> Gian Maria Ricci
> >> Cell: +39 320 0136949
> >>
> >> [image:
> >>
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png
> ]
> >> 
> [image:
> >>
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg
> ]
> >>  [image:
> >>
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg
> ]
> >>  [image:
> >>
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg
> ]
> >>  [image:
> >>
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg
> ]
> >>
> >>
> >>
> > --
> > Regards,
> > Binoy Dalal
>
-- 
Regards,
Binoy Dalal


Changing Solr Schema with Data

2015-12-28 Thread Salman Ansari
Hi,

I am facing an issue where I need to change Solr schema but I have crucial
data that I don't want to delete. Is there a way where I can change the
schema of the index while keeping the data intact?

Regards,
Salman


Re: Changing Solr Schema with Data

2015-12-28 Thread Alexandre Rafalovitch
Is the schema change affects the data you want to keep?

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 29 December 2015 at 01:48, Salman Ansari  wrote:
> Hi,
>
> I am facing an issue where I need to change Solr schema but I have crucial
> data that I don't want to delete. Is there a way where I can change the
> schema of the index while keeping the data intact?
>
> Regards,
> Salman


Re: Solr 6 - Relational Index querying

2015-12-28 Thread Joel Bernstein
Yes that would work. Each search(...) has it's own specific params and can
point to any handler that conforms to the output format in the /select
handler.


Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Dec 28, 2015 at 11:12 AM, Dennis Gove  wrote:

> Correct me if I'm wrong but I believe one can use the /export and /select
> handlers interchangeably within a single streaming expression. This could
> allow you to use the /select handler in the search(...) clause where a
> score is necessary and the /export handler in the search(...) clauses where
> it is not. Assuming the query in the clause with the score is limiting the
> resultset to a reasonable size this might be able to get you around the
> performance problems in using the /select handler in potentially other
> large streams which we are joining with.
>
> On Mon, Dec 28, 2015 at 9:11 AM, Joel Bernstein 
> wrote:
>
> > I'll add one important caveat:
> >
> > At this time the /export handler does not support returning scores. In
> > order to join result sets you would typically need to be working with the
> > entire result sets from both sides of the join, which may be too slow
> > without the /export handler. But if you're working with smaller result
> sets
> > it will be possible to use the default /select handler which will return
> > scores.
> >
> > Adding scores to the /export handler does need to get on the roadmap. The
> > initial release of the Streaming API was really designed for OLAP type
> > queries which typically don't involve scoring.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, Dec 28, 2015 at 8:49 AM, Dennis Gove  wrote:
> >
> > > There have been a lot of new features added to the Streaming API and
> the
> > > documentation hasn't kept pace, but it is something I'd like to have
> > filled
> > > in by the release of Solr 6.
> > >
> > > With the Streaming API you can take two (or more) totally disconnected
> > > collections and get a result set with documents from one, both, or all
> of
> > > them. To be clear, when I say they can be totally disconnected I mean
> > > exactly that - the collections do not need to share any infrastructure
> or
> > > even know about each other in anyway. They can exist across any number
> of
> > > data centers, use completely different Zookeeper clusters, etc... No
> > shared
> > > infrastructure is necessary. Updates/Inserts/Deletes to one of the
> > > collections has zero impact on the other collections.
> > >
> > > In your example, with Items and FacilityItems, I'd most likely
> construct
> > a
> > > join like this (note, I'm using Streaming Expresssions but the same
> would
> > > be possible in SQL).
> > >
> > > innerJoin(
> > >   search(items, fl="itemId,itemDescription", q="*:*", sort="itemId
> asc"),
> > >   search(facilityItems, fl="itemId,facilityName,cost", q="*:*",
> > > sort="itemId asc"),
> > >   on="itemId"
> > > )
> > >
> > > This will return documents with the fields itemId, itemDescription,
> > > facilityName, and cost. Because it's an innerJoin only documents with
> > parts
> > > found in both collections will be returned but if you want you can do a
> > > leftOuterJoin as well to get items which may not have facilityItems
> > > documents.
> > >
> > > Regarding the use of boosting - I'll assume that's because you're
> > returning
> > > results in score order. I can't remember the syntax to use in the
> > > search(...) clause to tell it to search by score but for the sake of
> > > discussion let's assume that sort="score desc" would do that (ie,
> highest
> > > score first). This poses a problem on the innerJoin because as it is a
> > > merge based join it does expect the two incoming streams to be sorted
> by
> > > the same fields but with a score sort that isn't possible. However, we
> > can
> > > instead use a hash based join to get around this limitation.
> > >
> > > hashJoin(
> > >   search(items, fl="itemId,itemDescription", q="itemDescription:bear",
> > > sort="score desc"),
> > >   hashed = search(facilityItems, fl="itemId,facilityName,cost",
> q="*:*",
> > > sort="itemId asc"),
> > >   on="itemId"
> > > )
> > >
> > > Note that in this I've changed the first search clause by adding a q
> > clause
> > > to find all where the description includes "bear" and to sort by the
> > score.
> > > I've also marked the second search clause as the on that should be
> > hashed.
> > > The stream that is marked to be hashed will be read in full and all
> > > documents stored in memory - for this reason you'll almost always want
> to
> > > hash the one with the fewest documents in it but do be aware that the
> > order
> > > of the results will depend on the order of the non-hashed stream. For
> > this
> > > reason I've hashed the one whose order we don't necessarily care about
> > and
> > > am preserving the ordering by score.
> > >
> > > This will return the exact same documents but the order will now be by

Re: Changing Solr Schema with Data

2015-12-28 Thread Jack Krupansky
All crucial data that you don't want to delete should be stored in a
non-Solr backing store, either flat files (e.g., CSV or Solr XML), an
RDBMS, or a NoSQL database. You should always be in a position to either
fully reindex or fully discard your Solr data. Solr is not a system of
record database. Was someone telling you something different?

-- Jack Krupansky

On Mon, Dec 28, 2015 at 1:48 PM, Salman Ansari 
wrote:

> Hi,
>
> I am facing an issue where I need to change Solr schema but I have crucial
> data that I don't want to delete. Is there a way where I can change the
> schema of the index while keeping the data intact?
>
> Regards,
> Salman
>


Re: Changing Solr Schema with Data

2015-12-28 Thread Salman Ansari
You can say that we are not removing any fields (so the old data should not
get affected), however, we need to add new fields (which new data will
have). Does that answer your question?


Regards,
Salman

On Mon, Dec 28, 2015 at 9:58 PM, Alexandre Rafalovitch 
wrote:

> Is the schema change affects the data you want to keep?
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 29 December 2015 at 01:48, Salman Ansari 
> wrote:
> > Hi,
> >
> > I am facing an issue where I need to change Solr schema but I have
> crucial
> > data that I don't want to delete. Is there a way where I can change the
> > schema of the index while keeping the data intact?
> >
> > Regards,
> > Salman
>


Re: Adding the same field value question

2015-12-28 Thread Jack Krupansky
Is the field multivalued?

-- Jack Krupansky

On Sun, Dec 27, 2015 at 11:16 PM, Jamie Johnson  wrote:

> What is the difference of adding a field with the same value twice or
> adding it once and boosting the field on add?  Is there a situation where
> one approach is preferred?
>
> Jamie
>


Issue with Join

2015-12-28 Thread William Bell
I have having issues with {!join}. If the core have multiValued field and
the inner join does not have a multiValued field it does not find the
ones...

Solr 5.3.1... 5.3.1

Example.

PS1226 is in practicing_specialties_codes in providersearch core. This
field is multiValued.

in the autosuggest core there is NOT a field for PS1226 in there. This
field is called prac_spec_code and is single values.


http://localhost:8983/solr/providersearch/select?q=*%3A*=json=true=practicing_specialties_codes:PS1226=practicing_specialties_codes

I get:


   - docs:
   [
  -
  {
 - practicing_specialties_codes:
 [
- "PS1010",
- "PS282",
- "PS1226"
]
 }
  ]



In autosuggest there is nothing:

http://localhost:8983/solr/autosuggest/select?q=*%3A*=json=true=prac_spec_code:PS1226=prac_spec_code

Nothing.

Then a join should find what is in providersearch but missing in
autosuggest.

http://localhost:8983/solr/providersearch/select?debugQuery=true=json=*:*=10=practicing_specialties_codes:PS1226=practicing_specialties_codes=NOT%20{!join%20from=prac_spec_code%20to=practicing_specialties_codes%20fromIndex=autosuggest}auto_type:PRACSPEC

or

http://hgsolr2sl1:8983/solr/providersearch/select?debugQuery=true=json=*:*=10=practicing_specialties_codes=NOT%20{!join%20from=prac_spec_code%20to=practicing_specialties_codes%20fromIndex=autosuggest}auto_type:PRACSPEC

or

http://hgsolr2sl1:8983/solr/providersearch/select?debugQuery=true=json=*:*=10=practicing_specialties_codes=NOT%20{!join%20from=prac_spec_code%20to=practicing_specialties_codes%20fromIndex=autosuggest}*:*

I also tried *:* AND NOT {!join}

I get 0 results. This seems to be a bug.

{

   - responseHeader:
   {
  - status: 0,
  - QTime: 178,
  - params:
  {
 - q: "*:*",
 - fl: "practicing_specialties_codes",
 - fq: "NOT {!join from=prac_spec_code
 to=practicing_specialties_codes fromIndex=autosuggest}*:*",
 - rows: "10",
 - wt: "json",
 - debugQuery: "true"
 }
  },
   - response:
   {
  - numFound: 0,
  - start: 0,
  - docs: [ ]
  },
   - debug:
   {
  - rawquerystring: "*:*",
  - querystring: "*:*",
  - parsedquery: "MatchAllDocsQuery(*:*)",
  - parsedquery_toString: "*:*",
  - explain: { },
  - QParser: "LuceneQParser",
  - filter_queries:
  [
 - "NOT {!join from=prac_spec_code to=practicing_specialties_codes
 fromIndex=autosuggest}*:*"
 ],
  - parsed_filter_queries:
  [
 - "-JoinQuery({!join from=prac_spec_code
 to=practicing_specialties_codes fromIndex=autosuggest}*:*)"
 ],
  - timing:
  {
 - time: 177,
 - prepare:
 {
- time: 0,
- query:
{
   - time: 0
   },
- facet:
{
   - time: 0
   },
- facet_module:
{
   - time: 0
   },
- mlt:
{
   - time: 0
   },
- highlight:
{
   - time: 0
   },
- stats:
{
   - time: 0
   },
- expand:
{
   - time: 0
   },
- debug:
{
   - time: 0
   }
},
 - process:
 {
- time: 177,
- query:
{
   - time: 177
   },
- facet:
{
   - time: 0
   },
- facet_module:
{
   - time: 0
   },
- mlt:
{
   - time: 0
   },
- highlight:
{
   - time: 0
   },
- stats:
{
   - time: 0
   },
- expand:
{
   - time: 0
   },
- debug:
{
   - time: 0
   }
}
 }
  }

}




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: how's multi-query scoring?

2015-12-28 Thread Jason
Hi, Binoy
Thanks your reply.

I've found why score is same.
If I query with test:(chloro-4-hydroxy), then score is same.
But quering with test:(chloro 4 hydroxy), thean score of id 'test1' is
bigger than 'test2'.
So pf parameter under edismax is only applied to explicitly separated
queries.
In my case, quering with test:(chloro-4-hydroxy) generates implicit term
query like (chloro 4 hydroxy) by worddelimiterfilter.
So pf is not apppied.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-s-multi-query-scoring-tp4247512p4247648.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how's multi-query scoring?

2015-12-28 Thread Binoy Dalal
Precisely. You can change how solr generates your phrase queries by
tweaking the worddelimiterfactory settings or setting the
autogeneratephrasequeries parameter for your fields to true or false.
This will also determine whether or not pf boosting is applied.

On Tue, 29 Dec 2015, 10:41 Jason  wrote:

> Hi, Binoy
> Thanks your reply.
>
> I've found why score is same.
> If I query with test:(chloro-4-hydroxy), then score is same.
> But quering with test:(chloro 4 hydroxy), thean score of id 'test1' is
> bigger than 'test2'.
> So pf parameter under edismax is only applied to explicitly separated
> queries.
> In my case, quering with test:(chloro-4-hydroxy) generates implicit term
> query like (chloro 4 hydroxy) by worddelimiterfilter.
> So pf is not apppied.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-s-multi-query-scoring-tp4247512p4247648.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: Changing Solr Schema with Data

2015-12-28 Thread Shalin Shekhar Mangar
Adding new fields is not a problem. You can continue to use your
existing index with the new schema.

On Tue, Dec 29, 2015 at 1:58 AM, Salman Ansari  wrote:
> You can say that we are not removing any fields (so the old data should not
> get affected), however, we need to add new fields (which new data will
> have). Does that answer your question?
>
>
> Regards,
> Salman
>
> On Mon, Dec 28, 2015 at 9:58 PM, Alexandre Rafalovitch 
> wrote:
>
>> Is the schema change affects the data you want to keep?
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 29 December 2015 at 01:48, Salman Ansari 
>> wrote:
>> > Hi,
>> >
>> > I am facing an issue where I need to change Solr schema but I have
>> crucial
>> > data that I don't want to delete. Is there a way where I can change the
>> > schema of the index while keeping the data intact?
>> >
>> > Regards,
>> > Salman
>>



-- 
Regards,
Shalin Shekhar Mangar.