DocSet getting cached in filterCache for facet request with {!cache=false}

2014-11-11 Thread Mohsin Beg Beg

Hello,

It seems Solr is caching when facting even with fq={!cache=false}*:* specified. 
This is what I am doing on Solr 4.10.0 on jre 1.7.0_51.

Query 1) No cache in filterCache as expected
http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*
http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache 
confirms this.

Query 2) Query result docset cached in filterCache unexpectedly ?
http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*facet=truefacet.field=foobarfacet.method=enum
http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache shows 
entry of item_*:*: org.apache.solr.search.BitDocSet@​66afbbf cached.

Suggestions why or how this may be avoided since I don't want to cache anything 
other than facet(ed) terms in the filterCache (for predictable heap usage).

The culprit seems to be line 1431 @ 
http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_10_2/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?view=markup

Thanks.

-M


How to specify a property for all cores

2014-11-11 Thread Andreas Hubold

Hi,

I'm using Solr 4.10.1 with the new solr.xml format (auto-discovered 
cores). I'm trying to set a property that I can reference in 
solrconfig.xml files of all cores.


I know I can use JVM system properties or add the property to each 
core's core.properties file. Is there another possibility? I don't want 
to use system properties as this would affect the whole servlet container.


Can I specify the property in a central configuration file?

I found some discussion about a solr.properties file. But there's no 
such thing in Solr 4.10? (see SOLR-4615).


Regards,
Andreas


Removing Common Web Page Header and Footer from content

2014-11-11 Thread Moumita Dhar01
Hi,

I am using Nutch 1.9 and Solr 4.6 to index a web application with approximately 
100 distinct  URL and contents.

Nutch is used to fetch the urls, links and the crawl the entire web application 
to extract all the content for  all pages, and send the content to  Solr.

The problem that I have now is that the first 1000 or so characters and the 
last 400 or so characters of the pages which are common header and footer are 
showing up in the  search results.

Is there a way  to ignore the links or keep only the static text in the content?

Any  useful pointers would be highly appreciated.


Regards,
Moumita Dhar


 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: Proper way to backup solr.

2014-11-11 Thread elmerfudd
First, I want to thank you for your response!

can you provide more information about the suggested hardlink  solution?
What are the advantages and disadvantages using it?

can you provide an example please?


meanwhile try to read about it and test it myself asap.


thanks!






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Proper-way-to-backup-solr-tp4168498p4168714.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: create new core based on named config set using the admin page

2014-11-11 Thread Andreas Hubold

Okay, I've created https://issues.apache.org/jira/browse/SOLR-6728

Erick Erickson wrote on 11/06/2014 08:00 PM:

Yeah, please create a JIRA. There are a couple of umbrella JIRAs that
you might want to link it to
I'm not sure it quite fits in either, if not just let it hang out there bear:

https://issues.apache.org/jira/browse/SOLR-6703
https://issues.apache.org/jira/browse/SOLR-6084

On Wed, Nov 5, 2014 at 11:57 PM, Andreas Hubold
andreas.hub...@coremedia.com wrote:

Hi,

Solr 4.8 introduced named config sets with
https://issues.apache.org/jira/browse/SOLR-4478. You can create a new core
based on a config set with the CoreAdmin API as described in
https://cwiki.apache.org/confluence/display/solr/Config+Sets

The Solr Admin page allows the creation of new cores as well. There's a Add
Core button in the Core Admin tab. This will open a dialog where you can
enter name, instanceDir, dataDir and the names of solrconfig.xml /
schema.xml. It would be cool and consistent if one could create a core based
on a named config set here as well.

I'm asking because I might have overlooked something or maybe somebody is
already working on this. But probably I should just create a JIRA issue,
right?

Regards,
Andreas

Ramzi Alqrainy wrote on 11/05/2014 08:24 PM:


Sorry, I did not get your point, can you please elaborate more



--
View this message in context:
http://lucene.472066.n3.nabble.com/create-new-core-based-on-named-config-set-using-the-admin-page-tp4167850p4167860.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Andreas Hubold
Software Architect

tel +49.40.325587.519
fax +49.40.325587.999
andreas.hub...@coremedia.com

CoreMedia AG
content | context | conversion

Ludwig-Erhard-Str. 18
20459 Hamburg, Germany
www.coremedia.com

Executive Board: Gerrit Kolb (CEO), Dr. Klemens Kleiminger (CFO)
Supervisory Board: Prof. Dr. Florian Matthes (Chairman)
Trade Register: Amtsgericht Hamburg, HR B 76277


.




--
Andreas Hubold
Software Architect

tel +49.40.325587.519
fax +49.40.325587.999
andreas.hub...@coremedia.com

CoreMedia AG
content | context | conversion

Ludwig-Erhard-Str. 18
20459 Hamburg, Germany
www.coremedia.com

Executive Board: Gerrit Kolb (CEO), Dr. Klemens Kleiminger (CFO)
Supervisory Board: Prof. Dr. Florian Matthes (Chairman)
Trade Register: Amtsgericht Hamburg, HR B 76277



I want to translate solr wiki to Korean.

2014-11-11 Thread Jeon Woosung
In Korea, only few people can read English well. Thus, it is difficult to
use solr. But I want solr to spread out . So I would like to translate solr
wiki to Korean. Is there any good ways to translate it?


Can I select dummy field(for count) from solr?

2014-11-11 Thread suhyunjeon
I want to show cumulative graph from banana framework(SiLK).
https://docs.lucidworks.com/display/SiLK/Banana

There is no cumulative graph so  I want to select count(*) from solr
collection like dummy field.

So then I am going to sum the field(count(*)) and show histogram graph.
Do you have idea? 

It does not work like this. (q = *:count(*))
http://lucene.472066.n3.nabble.com/file/n4168713/solr.png 

Help me please. 
Have a nice day!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-select-dummy-field-for-count-from-solr-tp4168713.html
Sent from the Solr - User mailing list archive at Nabble.com.


Parent query yields document which is not matched by parents filter

2014-11-11 Thread ku3ia
Hi, folks!

We are using parent/child architecture in our project and sometimes when
using child transformer ([child]) there are an exception:

Parent query yields document which is not matched by parents filter,
docID=...

Examples of query are:
http://localhost/solr/core/select?fq=id:123456789fl=*q={!child
of=DocumentType:parent}Text:foo
http://localhost/solr/core/select?q=id:123456789fl=*,[child
parentFilter=DocumentType:parent]

Documents in index:
doc
  id123456789_0/id
  Textfoo/Text
doc
doc
  id123456789/id
  DocumentTypeparent/DocumentType
  ... other fields
/doc

Root field in schema.xml is present.

When I made an optimize to one segment sometimes error is dissapeared, but
sometimes not. Please advice how to fix it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Parent-query-yields-document-which-is-not-matched-by-parents-filter-tp4168727.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Suggester not suggesting anything using DictionaryCompoundWordTokenFilterFactory

2014-11-11 Thread Thomas Michael Engelke
 I think I found the problem. The definition of the suggester component
has a field option which references the field that the suggester uses
to generate suggestions. Changing this to the field using the
DictionaryCompundWordTokenFilterFactory also suggests word parts.

Am 11.11.2014 08:52 schrieb Thomas Michael Engelke: 

 I'm toying around with the suggester component, like described here: 
 http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx
  [1]
 
 So I made 4 fields:
 
 field name=text_suggest type=text_suggest indexed=true stored=true 
 multiValued=true /
 copyField source=name dest=text_suggest /
 field name=text_suggest_edge type=text_suggest_edge indexed=true 
 stored=true multiValued=true /
 copyField source=name dest=text_suggest_edge /
 field name=text_suggest_ngram type=text_suggest_ngram indexed=true 
 stored=true multiValued=true /
 copyField source=name dest=text_suggest_ngram /
 field name=text_suggest_dictionary_ngram 
 type=text_suggest_dictionary_ngram indexed=true stored=true 
 multiValued=true /
 copyField source=name dest=text_suggest_dictionary_ngram /
 
 with the corresponding definitions:
 
 fieldType name=text_suggest class=solr.TextField
 analyzer
 tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.LowerCaseFilterFactory /
 /analyzer
 /fieldType
 fieldType name=text_suggest_edge class=solr.TextField
 analyzer
 tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=50 
 side=front /
 /analyzer
 /fieldType
 fieldType name=text_suggest_ngram class=solr.TextField
 analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=50 
 side=front /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 /fieldType
 fieldType name=text_suggest_dictionary_ngram class=solr.TextField
 analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.DictionaryCompoundWordTokenFilterFactory 
 dictionary=dictionary.txt minWordSize=5 minSubwordSize=3 
 maxSubwordSize=30 onlyLongestMatch=false/
 filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=50 
 side=front /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 /analyzer
 /fieldType
 
 I'm calling the suggester component this way:
 
 http://address:8983/solr/core/suggest?qf=text_suggest^6.0%20test_suggest_edge^3.0%20text_suggest_ngram^1.0%20text_suggest_dictionary_ngram^0.2q=wa
 
 This seems to work fine:
 
 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 /lst
 lst name=spellcheck
 lst name=suggestions
 lst name=wa
 int name=numFound5/int
 int name=startOffset0/int
 int name=endOffset2/int
 arr name=suggestion
 strwandelement aus gitter/str
 strwandelement aus stahlblech/str
 strwandelement/str
 strwandhalter für prospekte/str
 strwandascher, h 300 × b 230 × t 60 mm/str
 /arr
 /lst
 str name=collation(wandelement aus gitter)/str
 /lst
 /lst
 /response
 
 However, I added the fourth field so I could get low-boosted suggestions 
 using the afformentioned DictionaryCompoundWordTokenFilterFactory. A sample 
 analysis for the field(type) text_suggest_dictionary_ngram for the word 
 Geländewagen:
 
 g
 ge
 gel
 gelä
 gelän
 geländ
 gelände
 geländew
 geländewa
 geländewag
 geländewage
 geländewagen
 g
 ge
 gel
 gelä
 gelän
 geländ
 gelände
 w
 wa
 wag
 wage
 wagen
 
 As we can see, the DictionaryCompoundWordTokenFilterFactory extracts the word 
 wagen and EdgeNGrams it. However, I cannot get results from these NGrams. 
 Trying wag as the search term for the suggester, there are no results.
 
 However, doing an analysis of Geländewagen (as field value index) and wag 
 (as field value query), analysis shows a match.
 
 I had the thought that it might be because the underlying component of the 
 suggester is a spellchecker, and a spellchecker wouldn't correct wag to 
 wagen because there was an NGram that spelled wag, and so the word was 
 spelled correctly already. So I tried without the EdgeNGrams, but the result 
 stays the same.
 

Links:
--
[1]
http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx

How to suggest from multiple fields?

2014-11-11 Thread Thomas Michael Engelke
Like in this article 
(http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx), 
I am using multiple fields to generate different options for an 
autosuggest functionality:


- First, the whole field (top priority)
- Then, the whole field as EdgeNGrams from the left side (normal 
priority)

- Lastly, single words or word parts (compound words) as EdgeNGrams

However, I was not very successful in supplying a single requestHandler 
(/suggest) with data from multiple suggesters. I have also not been 
able to find any sample of how this might be done correctly.


Is there a sample that I can read, or a documentation of how this might 
be done? The referenced article was doing it, yet only marginally 
described the technical implementation.


Re: Removing Common Web Page Header and Footer from content

2014-11-11 Thread Ahmet Arslan
Hi Moumita,

Once, I used https://code.google.com/p/boilerpipe/ to remove common 
header/footers etc.

Ahmet



On Tuesday, November 11, 2014 10:41 AM, Moumita Dhar01 
moumita_dha...@infosys.com wrote:
Hi,

I am using Nutch 1.9 and Solr 4.6 to index a web application with approximately 
100 distinct  URL and contents.

Nutch is used to fetch the urls, links and the crawl the entire web application 
to extract all the content for  all pages, and send the content to  Solr.

The problem that I have now is that the first 1000 or so characters and the 
last 400 or so characters of the pages which are common header and footer are 
showing up in the  search results.

Is there a way  to ignore the links or keep only the static text in the content?

Any  useful pointers would be highly appreciated.


Regards,
Moumita Dhar


 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS*** 


Re: Analytics result for each Result Group

2014-11-11 Thread Talat Uyarer
Hi Anurag,

How can I find median function ? I use a lot that.

2014-11-09 20:39 GMT+02:00 Anurag Sharma anura...@gmail.com:
 Can a function query(http://wiki.apache.org/solr/FunctionQuery) serves your
 use case

 On Wed, Nov 5, 2014 at 3:36 PM, Talat Uyarer ta...@uyarer.com wrote:

 I searched wiki pages about that. I do not find any documentation. If
 you help me I will be glad.

 Thanks

 2014-11-04 11:34 GMT+02:00 Talat Uyarer ta...@uyarer.com:
  Hi folks,
 
  We use Analytics Component for median, max etc. I wonder if I use
  group.field parameter with analytics component, How to calculate
  analytics for each result group ?
 
  Thanks
 
  --
  Talat UYARER
  Websitesi: http://talat.uyarer.com
  Twitter: http://twitter.com/talatuyarer
  Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304



 --
 Talat UYARER
 Websitesi: http://talat.uyarer.com
 Twitter: http://twitter.com/talatuyarer
 Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304




-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304


Re: Proper way to backup solr.

2014-11-11 Thread Shawn Heisey
On 11/11/2014 1:45 AM, elmerfudd wrote:
 First, I want to thank you for your response!
 
 can you provide more information about the suggested hardlink  solution?
 What are the advantages and disadvantages using it?
 
 can you provide an example please?
 
 
 meanwhile try to read about it and test it myself asap.

Something like this:

mkdir -p ${BACKUPDIR}/corename/index
rm -f ${BACKUPDIR}/corename/index/*
cp -pl ${SOLRHOME}/corename/data/index/* ${BACKUPDIR}/corename/index/.

This does not include necessary additional steps like renaming previous
backups to set up an auto-rotating archive.  Because only you know what
your requirements are when it comes to backup archives, you'll need to
fill that part in.  As already mentioned, the source and destination
must be on the same filesystem.

There are very few disadvantages to this solution.  It maintains
instantaneous backups of previous index states with as little overhead
as possible.

Note that if you have a filesystem with good built-in snapshot support
(typically zfs or btrfs), you can use filesystem snapshots instead, with
much the same effect.

Thanks,
Shawn



Re: I want to translate solr wiki to Korean.

2014-11-11 Thread Steve Rowe
Hi Jeon Woosung,

The Solr community wiki is no longer the official Solr documentation location.  

The Solr Reference Guide is where Solr documentation is now maintained: 
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide.

I’m not sure what you mean when you ask “Is there any good ways to translate 
it?”  Are you asking about strategy?  Or tools?  Hmm, maybe you could translate 
each page using Google Translate or a similar service, then refine the 
translation manually?

I think you may be the first to offer to translate Solr documentation.  Perhaps 
we could create a new space in the Apache Confluence instance, where the Solr 
Reference Guide is, to host the translated documentation?

Probably it would be best to create a Solr JIRA issue where this topic can be 
discussed: https://issues.apache.org/jira/browse/SOLR.

Thanks for contributing!

Steve

 On Nov 11, 2014, at 4:30 AM, Jeon Woosung jeonwoos...@gmail.com wrote:
 
 In Korea, only few people can read English well. Thus, it is difficult to
 use solr. But I want solr to spread out . So I would like to translate solr
 wiki to Korean. Is there any good ways to translate it?



Re: DocSet getting cached in filterCache for facet request with {!cache=false}

2014-11-11 Thread Shawn Heisey
On 11/11/2014 1:22 AM, Mohsin Beg Beg wrote:
 It seems Solr is caching when facting even with fq={!cache=false}*:* 
 specified. This is what I am doing on Solr 4.10.0 on jre 1.7.0_51.
 
 Query 1) No cache in filterCache as expected
 http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*
 http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache 
 confirms this.
 
 Query 2) Query result docset cached in filterCache unexpectedly ?
 http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*facet=truefacet.field=foobarfacet.method=enum
 http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache 
 shows entry of item_*:*: org.apache.solr.search.BitDocSet@​66afbbf cached.
 
 Suggestions why or how this may be avoided since I don't want to cache 
 anything other than facet(ed) terms in the filterCache (for predictable heap 
 usage).

I hope this is just for testing, because fq=*:* is completely
unnecessary, and will cause Solr to do extra work that it doesn't need
to do.

Try changing that second query so q and fq are not the same, so you can
see for sure which one is producing the filterCache entry.  With the
same query for both, you cannot know which one is populating the
filterCache.  If it's coming from the q parameter, then it's probably
working as designed.  If it comes from the fq, then we probably actually
do have a problem that needs investigation.

Thanks,
Shawn



Re: Lucene to Solrcloud migration

2014-11-11 Thread Michal Krajňanský
Hi Eric, Michael,

thank you both for your comments.

2014-11-11 5:05 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 bq: - the documents are organized in shards according to date (integer)
 and
 language (a possibly extensible discrete set)

 bq: - the indexes are disjunct

 OK, I'm having a hard time getting my head around these two statements.

 If the indexes are disjunct in the sense that you only search one at a
 time,
 then they are different collections in SolrCloud jargon.


I just meant that every document is contained in a single one of the
indexes. I have a lot of Lucene indexes for various [language X timespan],
but logically we are speaking about a single huge index. That is why I
thought it would be natural to represent is as a single SolrCloud
collection.

If, on the other hand, these are a big collection and you want to search
 them all with a single query, I suggest that in SolrCloud land you don't
 want them to be discrete shards. My reasoning here is that let's say you
 have a bunch of documents for October, 2014 in Spanish. By putting these
 all on a single shard, your queries all have to be serviced by that one
 shard. You don't get any parallelism.


That is right. Actually the parallelization is not the main issue right
now. The queries are very sparse, currently our system does not support
load balancing at all. I imagined that in the future it could be achievable
via SolrCloud replication.

The main consideration is to be able to plug the indexes in and out on
demand. The total size of the data is in terabytes. We usually want to
search only the latest indexes but occassionally it is needed to plug in
one of the older ones.

Maybe (probably) I still have some misconceptions about the uses of
SolrCloud...

If it really does make sense in your case to route all the doc to a
 single shard,
 then Michael's comment is spot-on use compositeId router.


You confuse me here. I was not thinking about a single shard, on the
contrary, any [language X timespan] index would be itself a shard. I agree
that compositeId router seems to be natural for what I need. I am currently
searching for the way to convert my indexes in such way that my document
ID's have the composite format. Currently these are just unique integers,
so I would like to prefix all the document ID's of an index with it's
language and timespan. I do not know how, but I believe this should be
possible, as it is a constant operation that would not change the structure
of the index.

Best,

Michal



 Best,
 Erick

 On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
  Hi Michal,
 
  Is there a particular reason to shard your collections like that? If it
 was
  mainly for ease of operations, I'd consider just using CompositeId to
  prevent specific types of queries hotspotting particular nodes.
 
  If your ingest rate is fast, you might also consider making each
  collection an alias that points to many actual collections, and
  periodically closing off a collection and starting a new one. This
 prevents
  cache churn and the impact of large merges.
 
  Michael
 
 
 
  On 11/10/14 08:03, Michal Krajňanský wrote:
 
  Hi All,
 
  I have been working on a project that has long employed Lucene indexer.
 
  Currently, the system implements a proprietary document routing and
 index
  plugging/unplugging on top of the Lucene and of course contains a great
  body of indexes. Recently an idea came up to migrate from Lucene to
  Solrcloud, which appears to be more powerfull that our proprietary
 system.
 
  Could you suggest the best way to seamlessly migrate the system to use
  Solrcloud, when the reindexing is not an option?
 
  - all the existing indexes represent a single collection in terms of
  Solrcloud
  - the documents are organized in shards according to date (integer)
 and
  language (a possibly extensible discrete set)
  - the indexes are disjunct
 
  I have been able to convert the existing indexes to the newest Lucene
  version and plug them individually into the Solrcloud. However, there is
  the question of routing, sharding etc.
 
  Any insight appreciated.
 
  Best,
 
 
  Michal Krajnansky
 
 



Re: Lucene to Solrcloud migration

2014-11-11 Thread Michael Della Bitta
Yeah, Erick confused me a bit too, but I think what he's talking about 
takes for granted that you'd have your various indexes directly set up 
as individual collections.


If instead you're considering one big collection, or a few collections 
based on aggregations of your individual indexes, having big, 
multisharded collections using compositeId should work, unless there's a 
use case we're not discussing.


Michael

On 11/11/14 10:27, Michal Krajňanský wrote:

Hi Eric, Michael,

thank you both for your comments.

2014-11-11 5:05 GMT+01:00 Erick Erickson erickerick...@gmail.com:


bq: - the documents are organized in shards according to date (integer)
and
language (a possibly extensible discrete set)

bq: - the indexes are disjunct

OK, I'm having a hard time getting my head around these two statements.

If the indexes are disjunct in the sense that you only search one at a
time,
then they are different collections in SolrCloud jargon.



I just meant that every document is contained in a single one of the
indexes. I have a lot of Lucene indexes for various [language X timespan],
but logically we are speaking about a single huge index. That is why I
thought it would be natural to represent is as a single SolrCloud
collection.

If, on the other hand, these are a big collection and you want to search

them all with a single query, I suggest that in SolrCloud land you don't
want them to be discrete shards. My reasoning here is that let's say you
have a bunch of documents for October, 2014 in Spanish. By putting these
all on a single shard, your queries all have to be serviced by that one
shard. You don't get any parallelism.



That is right. Actually the parallelization is not the main issue right
now. The queries are very sparse, currently our system does not support
load balancing at all. I imagined that in the future it could be achievable
via SolrCloud replication.

The main consideration is to be able to plug the indexes in and out on
demand. The total size of the data is in terabytes. We usually want to
search only the latest indexes but occassionally it is needed to plug in
one of the older ones.

Maybe (probably) I still have some misconceptions about the uses of
SolrCloud...

If it really does make sense in your case to route all the doc to a

single shard,
then Michael's comment is spot-on use compositeId router.



You confuse me here. I was not thinking about a single shard, on the
contrary, any [language X timespan] index would be itself a shard. I agree
that compositeId router seems to be natural for what I need. I am currently
searching for the way to convert my indexes in such way that my document
ID's have the composite format. Currently these are just unique integers,
so I would like to prefix all the document ID's of an index with it's
language and timespan. I do not know how, but I believe this should be
possible, as it is a constant operation that would not change the structure
of the index.

Best,

Michal




Best,
Erick

On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:

Hi Michal,

Is there a particular reason to shard your collections like that? If it

was

mainly for ease of operations, I'd consider just using CompositeId to
prevent specific types of queries hotspotting particular nodes.

If your ingest rate is fast, you might also consider making each
collection an alias that points to many actual collections, and
periodically closing off a collection and starting a new one. This

prevents

cache churn and the impact of large merges.

Michael



On 11/10/14 08:03, Michal Krajňanský wrote:

Hi All,

I have been working on a project that has long employed Lucene indexer.

Currently, the system implements a proprietary document routing and

index

plugging/unplugging on top of the Lucene and of course contains a great
body of indexes. Recently an idea came up to migrate from Lucene to
Solrcloud, which appears to be more powerfull that our proprietary

system.

Could you suggest the best way to seamlessly migrate the system to use
Solrcloud, when the reindexing is not an option?

- all the existing indexes represent a single collection in terms of
Solrcloud
- the documents are organized in shards according to date (integer)

and

language (a possibly extensible discrete set)
- the indexes are disjunct

I have been able to convert the existing indexes to the newest Lucene
version and plug them individually into the Solrcloud. However, there is
the question of routing, sharding etc.

Any insight appreciated.

Best,


Michal Krajnansky





Re: Lucene to Solrcloud migration

2014-11-11 Thread Michal Krajňanský
Hm. So I found that one can update stored fields with atomic update
operation, however according to
http://stackoverflow.com/questions/19058795/it-is-possible-to-update-uniquekey-in-solr-4
this will not work for uniqueKey. So I guess with compositeId router I am
out of luck.

I have been also searching for a way to implement my own routing mechanism.
Anyway, this seem to be a cleaner solution -- I would not need to modify
existing index, just compute hash from the other (stored) fields than just
document id. Can you confirm that it is possible? The documentation is
however very modest (I only found that it is possible to specify custom
hash function).

Best,

Michal

2014-11-11 16:48 GMT+01:00 Michael Della Bitta 
michael.della.bi...@appinions.com:

 Yeah, Erick confused me a bit too, but I think what he's talking about
 takes for granted that you'd have your various indexes directly set up as
 individual collections.

 If instead you're considering one big collection, or a few collections
 based on aggregations of your individual indexes, having big, multisharded
 collections using compositeId should work, unless there's a use case we're
 not discussing.

 Michael


 On 11/11/14 10:27, Michal Krajňanský wrote:

 Hi Eric, Michael,

 thank you both for your comments.

 2014-11-11 5:05 GMT+01:00 Erick Erickson erickerick...@gmail.com:

  bq: - the documents are organized in shards according to date (integer)
 and
 language (a possibly extensible discrete set)

 bq: - the indexes are disjunct

 OK, I'm having a hard time getting my head around these two statements.

 If the indexes are disjunct in the sense that you only search one at a
 time,
 then they are different collections in SolrCloud jargon.


  I just meant that every document is contained in a single one of the
 indexes. I have a lot of Lucene indexes for various [language X timespan],
 but logically we are speaking about a single huge index. That is why I
 thought it would be natural to represent is as a single SolrCloud
 collection.

 If, on the other hand, these are a big collection and you want to search

 them all with a single query, I suggest that in SolrCloud land you don't
 want them to be discrete shards. My reasoning here is that let's say you
 have a bunch of documents for October, 2014 in Spanish. By putting these
 all on a single shard, your queries all have to be serviced by that one
 shard. You don't get any parallelism.


  That is right. Actually the parallelization is not the main issue right
 now. The queries are very sparse, currently our system does not support
 load balancing at all. I imagined that in the future it could be
 achievable
 via SolrCloud replication.

 The main consideration is to be able to plug the indexes in and out on
 demand. The total size of the data is in terabytes. We usually want to
 search only the latest indexes but occassionally it is needed to plug in
 one of the older ones.

 Maybe (probably) I still have some misconceptions about the uses of
 SolrCloud...

 If it really does make sense in your case to route all the doc to a

 single shard,
 then Michael's comment is spot-on use compositeId router.


  You confuse me here. I was not thinking about a single shard, on the
 contrary, any [language X timespan] index would be itself a shard. I agree
 that compositeId router seems to be natural for what I need. I am
 currently
 searching for the way to convert my indexes in such way that my document
 ID's have the composite format. Currently these are just unique integers,
 so I would like to prefix all the document ID's of an index with it's
 language and timespan. I do not know how, but I believe this should be
 possible, as it is a constant operation that would not change the
 structure
 of the index.

 Best,

 Michal



  Best,
 Erick

 On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:

 Hi Michal,

 Is there a particular reason to shard your collections like that? If it

 was

 mainly for ease of operations, I'd consider just using CompositeId to
 prevent specific types of queries hotspotting particular nodes.

 If your ingest rate is fast, you might also consider making each
 collection an alias that points to many actual collections, and
 periodically closing off a collection and starting a new one. This

 prevents

 cache churn and the impact of large merges.

 Michael



 On 11/10/14 08:03, Michal Krajňanský wrote:

 Hi All,

 I have been working on a project that has long employed Lucene indexer.

 Currently, the system implements a proprietary document routing and

 index

 plugging/unplugging on top of the Lucene and of course contains a great
 body of indexes. Recently an idea came up to migrate from Lucene to
 Solrcloud, which appears to be more powerfull that our proprietary

 system.

 Could you suggest the best way to seamlessly migrate the system to use
 Solrcloud, when the reindexing is not an option?

 - all the existing indexes 

Re: DocSet getting cached in filterCache for facet request with {!cache=false}

2014-11-11 Thread Erick Erickson
Well, the difference that you're faceting with method=enum, which uses
the filterCache (I think, it's been a while).

I admit I'm a little surprised that when I tried faceting with the
inStock field in the standard distro I got 3 entries when there are
only two values but I'm willing to let that go ;)

i.e. this produces 3 entries in the filterCache:
http://localhost:8983/solr/techproducts/select?q=*:*rows=0facet=truefacet.field=inStockfacet.method=enum

not an fq clause in sight..

Best,
Erick

On Tue, Nov 11, 2014 at 9:31 AM, Shawn Heisey apa...@elyograg.org wrote:
 On 11/11/2014 1:22 AM, Mohsin Beg Beg wrote:
 It seems Solr is caching when facting even with fq={!cache=false}*:* 
 specified. This is what I am doing on Solr 4.10.0 on jre 1.7.0_51.

 Query 1) No cache in filterCache as expected
 http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*
 http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache 
 confirms this.

 Query 2) Query result docset cached in filterCache unexpectedly ?
 http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*facet=truefacet.field=foobarfacet.method=enum
 http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache 
 shows entry of item_*:*: org.apache.solr.search.BitDocSet@66afbbf cached.

 Suggestions why or how this may be avoided since I don't want to cache 
 anything other than facet(ed) terms in the filterCache (for predictable heap 
 usage).

 I hope this is just for testing, because fq=*:* is completely
 unnecessary, and will cause Solr to do extra work that it doesn't need
 to do.

 Try changing that second query so q and fq are not the same, so you can
 see for sure which one is producing the filterCache entry.  With the
 same query for both, you cannot know which one is populating the
 filterCache.  If it's coming from the q parameter, then it's probably
 working as designed.  If it comes from the fq, then we probably actually
 do have a problem that needs investigation.

 Thanks,
 Shawn



Re: Lucene to Solrcloud migration

2014-11-11 Thread Erick Erickson
bq:  So I guess with compositeId router I am out of luck.

No, not at all. Atomic updates are exactly about updating
a doc and NOT changing the id. A different uniqueKey is
a different doc by definition.

So you can easily use atomic updates with composite IDs
since you are changing a field of an existing doc as long
as the router bits are the same.

But that may be irrelevant

Take a look at LotsOfCores (WARNING! this is NOT
verified in SolrCloud!). The design there is exactly to
limit the number of simultaneous cores in memory, having
them load/unload themselves based on the limits you set up.
So you can just fire queries blindly at your server where the
URL includes the core name and be confident that you'll stay
within your hardware limits.

http://wiki.apache.org/solr/LotsOfCores

If you're using SolrCloud, though, there's really no concept
of unloading specific cores/indexes at once, it really pre-supposes
that you've scaled your system such that you can have them all
active at once. So I don't really see how routing to specific cores
is going to help you.

Then again I don't know your problem space.

Best,
Erick

On Tue, Nov 11, 2014 at 11:33 AM, Michal Krajňanský
michal.krajnan...@gmail.com wrote:
 Hm. So I found that one can update stored fields with atomic update
 operation, however according to
 http://stackoverflow.com/questions/19058795/it-is-possible-to-update-uniquekey-in-solr-4
 this will not work for uniqueKey. So I guess with compositeId router I am
 out of luck.

 I have been also searching for a way to implement my own routing mechanism.
 Anyway, this seem to be a cleaner solution -- I would not need to modify
 existing index, just compute hash from the other (stored) fields than just
 document id. Can you confirm that it is possible? The documentation is
 however very modest (I only found that it is possible to specify custom
 hash function).

 Best,

 Michal

 2014-11-11 16:48 GMT+01:00 Michael Della Bitta 
 michael.della.bi...@appinions.com:

 Yeah, Erick confused me a bit too, but I think what he's talking about
 takes for granted that you'd have your various indexes directly set up as
 individual collections.

 If instead you're considering one big collection, or a few collections
 based on aggregations of your individual indexes, having big, multisharded
 collections using compositeId should work, unless there's a use case we're
 not discussing.

 Michael


 On 11/11/14 10:27, Michal Krajňanský wrote:

 Hi Eric, Michael,

 thank you both for your comments.

 2014-11-11 5:05 GMT+01:00 Erick Erickson erickerick...@gmail.com:

  bq: - the documents are organized in shards according to date (integer)
 and
 language (a possibly extensible discrete set)

 bq: - the indexes are disjunct

 OK, I'm having a hard time getting my head around these two statements.

 If the indexes are disjunct in the sense that you only search one at a
 time,
 then they are different collections in SolrCloud jargon.


  I just meant that every document is contained in a single one of the
 indexes. I have a lot of Lucene indexes for various [language X timespan],
 but logically we are speaking about a single huge index. That is why I
 thought it would be natural to represent is as a single SolrCloud
 collection.

 If, on the other hand, these are a big collection and you want to search

 them all with a single query, I suggest that in SolrCloud land you don't
 want them to be discrete shards. My reasoning here is that let's say you
 have a bunch of documents for October, 2014 in Spanish. By putting these
 all on a single shard, your queries all have to be serviced by that one
 shard. You don't get any parallelism.


  That is right. Actually the parallelization is not the main issue right
 now. The queries are very sparse, currently our system does not support
 load balancing at all. I imagined that in the future it could be
 achievable
 via SolrCloud replication.

 The main consideration is to be able to plug the indexes in and out on
 demand. The total size of the data is in terabytes. We usually want to
 search only the latest indexes but occassionally it is needed to plug in
 one of the older ones.

 Maybe (probably) I still have some misconceptions about the uses of
 SolrCloud...

 If it really does make sense in your case to route all the doc to a

 single shard,
 then Michael's comment is spot-on use compositeId router.


  You confuse me here. I was not thinking about a single shard, on the
 contrary, any [language X timespan] index would be itself a shard. I agree
 that compositeId router seems to be natural for what I need. I am
 currently
 searching for the way to convert my indexes in such way that my document
 ID's have the composite format. Currently these are just unique integers,
 so I would like to prefix all the document ID's of an index with it's
 language and timespan. I do not know how, but I believe this should be
 possible, as it is a constant operation that would not change the
 

Re: How to Facet external fields

2014-11-11 Thread bbarani
Thanks for your response.. It's indeed a good idea..I will try that out..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-generate-calculate-facet-counts-for-external-fields-tp4168653p4168790.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does ReRankQuery support reranking the result of a FuzzyQuery?

2014-11-11 Thread Joel Bernstein
This issue should be resolved in
https://issues.apache.org/jira/browse/SOLR-6323.

This is committed in trunk, 5x, 4x, and 4_10, but this did not make it into
4.10.2. If you take the version in the 4_10 branch you should be good to
go. If a version 4.10.3 is cut, this will be included.

Joel Bernstein
Search Engineer at Heliosearch

On Mon, Nov 10, 2014 at 1:50 PM, Brian Sawyer bsaw...@basistech.com wrote:

 Hello,

 We are trying to make use of the new ReRankQuery to rescore results
 according to a custom function but run into problems when our main query
 includes a FuzzyQuery.

 Using the example setup in Solr 4.10.2 querying:

 q=name:Dell~1
 rq={!rerank reRankQuery=id:whatever}

 results in:
 java.lang.UnsupportedOperationException: Query name:delk~1 does not
 implement createWeight

 Is this a bug or is this intended?

 Thanks,
 Brian

 Full stack trace below:

 java.lang.UnsupportedOperationException: Query name:delk~1 does not
 implement createWeight
 at org.apache.lucene.search.Query.createWeight(Query.java:80)
 at org.apache.solr.search.ReRankQParserPlugin$ReRankWeight.init
 (ReRankQParserPlugin.java:177)
 at

 org.apache.solr.search.ReRankQParserPlugin$ReRankQuery.createWeight(ReRankQParserPlugin.java:163)
 at

 org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
 at

 org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:209)
 at

 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1619)
 at

 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1433)
 at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)
 at

 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:485)
 at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 at

 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at

 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at

 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at

 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at

 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)



Re: DocSet getting cached in filterCache for facet request with {!cache=false}

2014-11-11 Thread Mohsin Beg Beg

Shawn, then how to skip filterCache for facet.method=enum ?

Wiki says fq={!cache=false}*:* is ok, no?
https://wiki.apache.org/solr/SolrCaching#filterCache

-Mohsin


- Original Message -
From: erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tuesday, November 11, 2014 8:40:54 AM GMT -08:00 US/Canada Pacific
Subject: Re: DocSet getting cached in filterCache for facet request with 
{!cache=false}

Well, the difference that you're faceting with method=enum, which uses
the filterCache (I think, it's been a while).

I admit I'm a little surprised that when I tried faceting with the
inStock field in the standard distro I got 3 entries when there are
only two values but I'm willing to let that go ;)

i.e. this produces 3 entries in the filterCache:
http://localhost:8983/solr/techproducts/select?q=*:*rows=0facet=truefacet.field=inStockfacet.method=enum

not an fq clause in sight..

Best,
Erick

On Tue, Nov 11, 2014 at 9:31 AM, Shawn Heisey apa...@elyograg.org wrote:
 On 11/11/2014 1:22 AM, Mohsin Beg Beg wrote:
 It seems Solr is caching when facting even with fq={!cache=false}*:* 
 specified. This is what I am doing on Solr 4.10.0 on jre 1.7.0_51.

 Query 1) No cache in filterCache as expected
 http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*
 http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache 
 confirms this.

 Query 2) Query result docset cached in filterCache unexpectedly ?
 http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*facet=truefacet.field=foobarfacet.method=enum
 http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache 
 shows entry of item_*:*: org.apache.solr.search.BitDocSet@66afbbf cached.

 Suggestions why or how this may be avoided since I don't want to cache 
 anything other than facet(ed) terms in the filterCache (for predictable heap 
 usage).

 I hope this is just for testing, because fq=*:* is completely
 unnecessary, and will cause Solr to do extra work that it doesn't need
 to do.

 Try changing that second query so q and fq are not the same, so you can
 see for sure which one is producing the filterCache entry.  With the
 same query for both, you cannot know which one is populating the
 filterCache.  If it's coming from the q parameter, then it's probably
 working as designed.  If it comes from the fq, then we probably actually
 do have a problem that needs investigation.

 Thanks,
 Shawn



Re: DocSet getting cached in filterCache for facet request with {!cache=false}

2014-11-11 Thread Yonik Seeley
On Tue, Nov 11, 2014 at 1:25 PM, Mohsin Beg Beg mohsin@oracle.com wrote:
 Wiki says fq={!cache=false}*:* is ok, no?

That's for the filtering... not for the faceting.

 then how to skip filterCache for facet.method=enum ?

Specify a high minDF (the min docfreq or number of documents that
need to match a term before the filter cache will be used).

facet.enum.cache.minDf=1000

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: Does ReRankQuery support reranking the result of a FuzzyQuery?

2014-11-11 Thread Joel Bernstein
Just verified that fuzzy queries work in trunk with this test:

   params = new ModifiableSolrParams();
params.add(rq, {!rerank reRankQuery=$rqq reRankDocs=6});
params.add(q, term_s:~1 AND test_ti:[0 TO 2000]);
params.add(rqq, id:1^10 id:2^20 id:3^30 id:4^40 id:5^50 id:6^60);
params.add(fl, id,score);
params.add(start, 0);
params.add(rows, 6);

assertQ(req(params), *[count(//doc)=5],
//result/doc[1]/float[@name='id'][.='6.0'],
//result/doc[2]/float[@name='id'][.='5.0'],
//result/doc[3]/float[@name='id'][.='4.0'],
//result/doc[4]/float[@name='id'][.='2.0'],
//result/doc[5]/float[@name='id'][.='1.0']
);

Joel Bernstein
Search Engineer at Heliosearch

On Tue, Nov 11, 2014 at 1:04 PM, Joel Bernstein joels...@gmail.com wrote:

 This issue should be resolved in
 https://issues.apache.org/jira/browse/SOLR-6323.

 This is committed in trunk, 5x, 4x, and 4_10, but this did not make it
 into 4.10.2. If you take the version in the 4_10 branch you should be good
 to go. If a version 4.10.3 is cut, this will be included.

 Joel Bernstein
 Search Engineer at Heliosearch

 On Mon, Nov 10, 2014 at 1:50 PM, Brian Sawyer bsaw...@basistech.com
 wrote:

 Hello,

 We are trying to make use of the new ReRankQuery to rescore results
 according to a custom function but run into problems when our main query
 includes a FuzzyQuery.

 Using the example setup in Solr 4.10.2 querying:

 q=name:Dell~1
 rq={!rerank reRankQuery=id:whatever}

 results in:
 java.lang.UnsupportedOperationException: Query name:delk~1 does not
 implement createWeight

 Is this a bug or is this intended?

 Thanks,
 Brian

 Full stack trace below:

 java.lang.UnsupportedOperationException: Query name:delk~1 does not
 implement createWeight
 at org.apache.lucene.search.Query.createWeight(Query.java:80)
 at org.apache.solr.search.ReRankQParserPlugin$ReRankWeight.init
 (ReRankQParserPlugin.java:177)
 at

 org.apache.solr.search.ReRankQParserPlugin$ReRankQuery.createWeight(ReRankQParserPlugin.java:163)
 at

 org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
 at

 org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:209)
 at

 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1619)
 at

 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1433)
 at

 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)
 at

 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:485)
 at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 at

 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at

 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at

 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at

 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at

 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at

 

SOLRJ Atomic updates of String field

2014-11-11 Thread bbarani
I am using the below code to do partial update (in SOLR 4.2)

partialUpdate = new HashMapString, Object();
partialUpdate.put(set,Object);
doc.setField(description, partialUpdate);
server.add(docs);
server.commit();

I am seeing the below description value with {set =...}, Any idea why this
is getting added?

str name=description
{set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip
for faster processing and longer battery life, the M8 motion coprocessor to
track speed, distance and elevation, and with an 8MP iSight camera, you can
record 1080p HD Video at 60 FPS!}
/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Analytics result for each Result Group

2014-11-11 Thread Anurag Sharma
Probably sum and division can be applied to get the median.
If you are using ver above 5,
http://svn.apache.org/repos/asf/lucene/dev//trunk/solr/contrib/analytics/src/java/org/apache/solr/analytics/statistics/MedianStatsCollector.java
can be used directly

On Tue, Nov 11, 2014 at 5:49 PM, Talat Uyarer ta...@uyarer.com wrote:

 Hi Anurag,

 How can I find median function ? I use a lot that.

 2014-11-09 20:39 GMT+02:00 Anurag Sharma anura...@gmail.com:
  Can a function query(http://wiki.apache.org/solr/FunctionQuery) serves
 your
  use case
 
  On Wed, Nov 5, 2014 at 3:36 PM, Talat Uyarer ta...@uyarer.com wrote:
 
  I searched wiki pages about that. I do not find any documentation. If
  you help me I will be glad.
 
  Thanks
 
  2014-11-04 11:34 GMT+02:00 Talat Uyarer ta...@uyarer.com:
   Hi folks,
  
   We use Analytics Component for median, max etc. I wonder if I use
   group.field parameter with analytics component, How to calculate
   analytics for each result group ?
  
   Thanks
  
   --
   Talat UYARER
   Websitesi: http://talat.uyarer.com
   Twitter: http://twitter.com/talatuyarer
   Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
 
 
 
  --
  Talat UYARER
  Websitesi: http://talat.uyarer.com
  Twitter: http://twitter.com/talatuyarer
  Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
 



 --
 Talat UYARER
 Websitesi: http://talat.uyarer.com
 Twitter: http://twitter.com/talatuyarer
 Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304



Re: How to suggest from multiple fields?

2014-11-11 Thread Michael Sokolov
The usual approach is to use copyField to copy multiple fields to a 
single field.


I posted a solution using an UpdateRequestProcessor to merge fields, but 
with different analyzers, here:


https://blog.safaribooksonline.com/2014/04/15/search-suggestions-with-solr-2/

My latest approach is this:

https://github.com/safarijv/ifpress-solr-plugin/blob/master/src/main/java/com/ifactory/press/db/solr/spelling/suggest/MultiSuggester.java

which merges the fields while building the suggester index, allowing us 
to provide different weights for suggestions from different fields.


-Mike

On 11/11/2014 06:59 AM, Thomas Michael Engelke wrote:
Like in this article 
(http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx), 
I am using multiple fields to generate different options for an 
autosuggest functionality:


- First, the whole field (top priority)
- Then, the whole field as EdgeNGrams from the left side (normal 
priority)

- Lastly, single words or word parts (compound words) as EdgeNGrams

However, I was not very successful in supplying a single 
requestHandler (/suggest) with data from multiple suggesters. I have 
also not been able to find any sample of how this might be done 
correctly.


Is there a sample that I can read, or a documentation of how this 
might be done? The referenced article was doing it, yet only 
marginally described the technical implementation.




Different ids for the same document in different replicas.

2014-11-11 Thread S.L
Hi All,

I am seeing interesting behavior on the replicas , I have a single
shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
number of documents ~375 that are replicated across the six replicas .

The interesting thing is that the same  document has a different id in
each one of those replicas .

This is causing the fq(id:xyz) type queries to fail, depending on
which replica the query goes to.

I have  specified the id field in the following manner in schema.xml,
is it the right way to specifiy an auto generated id in  SolrCloud ?

field name=id type=uuid indexed=true stored=true
required=true multiValued=false /


Thanks.


Re: Different ids for the same document in different replicas.

2014-11-11 Thread Garth Grimm
“uuid” isn’t an out of the box field type that I’m familiar with.

Generally, I’d stick with the out of the box advice of the schema.xml file, 
which includes things like….

   !-- Only remove the id field if you have a very good reason to. While not 
strictly
 required, it is highly recommended. A uniqueKey is present in almost all 
Solr 
 installations. See the uniqueKey declaration below where uniqueKey is 
set to id.
   --   
   field name=id type=string indexed=true stored=true required=true 
multiValued=false / 

and…

 !-- Field to use to determine and enforce document uniqueness. 
  Unless this field is marked with required=false, it will be a required 
field
   --
 uniqueKeyid/uniqueKey

If you’re creating some key/value pair with uuid as the key as you feed 
documents in, and you know that the uuid values you’re creating are unique, 
just change the field name and unique key name from ‘id’ to ‘uuid’.  Or change 
the key name you send in from ‘uuid’ to ‘id’.

On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.com wrote:

 Hi All,
 
 I am seeing interesting behavior on the replicas , I have a single
 shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
 number of documents ~375 that are replicated across the six replicas .
 
 The interesting thing is that the same  document has a different id in
 each one of those replicas .
 
 This is causing the fq(id:xyz) type queries to fail, depending on
 which replica the query goes to.
 
 I have  specified the id field in the following manner in schema.xml,
 is it the right way to specifiy an auto generated id in  SolrCloud ?
 
field name=id type=uuid indexed=true stored=true
required=true multiValued=false /
 
 
 Thanks.



Re: Different ids for the same document in different replicas.

2014-11-11 Thread Garth Grimm
Looking a little deeper, I did find this about UUIDField

http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html

NOTE: Configuring a UUIDField instance with a default value of NEW is not 
advisable for most users when using SolrCloud (and not possible if the UUID 
value is configured as the unique key field) since the result will be that each 
replica of each document will get a unique UUID value. Using 
UUIDUpdateProcessorFactoryhttp://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
 to generate UUID values when documents are added is recomended instead.”

That might describe the behavior you saw.  And the use of 
UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well here:

http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/

Though I’ve not actually tried that process before.

On Nov 11, 2014, at 7:39 PM, Garth Grimm 
garthgr...@averyranchconsulting.commailto:garthgr...@averyranchconsulting.com
 wrote:

“uuid” isn’t an out of the box field type that I’m familiar with.

Generally, I’d stick with the out of the box advice of the schema.xml file, 
which includes things like….

  !-- Only remove the id field if you have a very good reason to. While not 
strictly
required, it is highly recommended. A uniqueKey is present in almost all 
Solr
installations. See the uniqueKey declaration below where uniqueKey is 
set to id.
  --
  field name=id type=string indexed=true stored=true required=true 
multiValued=false /

and…

!-- Field to use to determine and enforce document uniqueness.
 Unless this field is marked with required=false, it will be a required 
field
  --
uniqueKeyid/uniqueKey

If you’re creating some key/value pair with uuid as the key as you feed 
documents in, and you know that the uuid values you’re creating are unique, 
just change the field name and unique key name from ‘id’ to ‘uuid’.  Or change 
the key name you send in from ‘uuid’ to ‘id’.

On Nov 11, 2014, at 7:18 PM, S.L 
simpleliving...@gmail.commailto:simpleliving...@gmail.com wrote:

Hi All,

I am seeing interesting behavior on the replicas , I have a single
shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
number of documents ~375 that are replicated across the six replicas .

The interesting thing is that the same  document has a different id in
each one of those replicas .

This is causing the fq(id:xyz) type queries to fail, depending on
which replica the query goes to.

I have  specified the id field in the following manner in schema.xml,
is it the right way to specifiy an auto generated id in  SolrCloud ?

  field name=id type=uuid indexed=true stored=true
  required=true multiValued=false /


Thanks.




Re: DocSet getting cached in filterCache for facet request with {!cache=false}

2014-11-11 Thread Erick Erickson
The first thing I'd try is to stop explicitly _telling_ solr to use
the enum method by
omitting the facet.method=enum from your URL ;)...

I'm guessing that the field in question has very few unique values, so
you probably
need to do what Yonik suggests

Erick

On Tue, Nov 11, 2014 at 1:30 PM, Yonik Seeley yo...@heliosearch.com wrote:
 On Tue, Nov 11, 2014 at 1:25 PM, Mohsin Beg Beg mohsin@oracle.com wrote:
 Wiki says fq={!cache=false}*:* is ok, no?

 That's for the filtering... not for the faceting.

 then how to skip filterCache for facet.method=enum ?

 Specify a high minDF (the min docfreq or number of documents that
 need to match a term before the filter cache will be used).

 facet.enum.cache.minDf=1000

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data


Re: SOLRJ Atomic updates of String field

2014-11-11 Thread Anurag Sharma
Sorry didn't get what you are trying to achieve and the issue.

On Wed, Nov 12, 2014 at 12:20 AM, bbarani bbar...@gmail.com wrote:

 I am using the below code to do partial update (in SOLR 4.2)

 partialUpdate = new HashMapString, Object();
 partialUpdate.put(set,Object);
 doc.setField(description, partialUpdate);
 server.add(docs);
 server.commit();

 I am seeing the below description value with {set =...}, Any idea why this
 is getting added?

 str name=description
 {set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip
 for faster processing and longer battery life, the M8 motion coprocessor to
 track speed, distance and elevation, and with an 8MP iSight camera, you can
 record 1080p HD Video at 60 FPS!}
 /str



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLRJ Atomic updates of String field

2014-11-11 Thread Ahmet Arslan
Hi Bbarani,

Partial update solrJ example can be found in : 
http://find.searchhub.org/document/5b1187abfcfad33f

Ahmet



On Tuesday, November 11, 2014 8:51 PM, bbarani bbar...@gmail.com wrote:
I am using the below code to do partial update (in SOLR 4.2)

partialUpdate = new HashMapString, Object();
partialUpdate.put(set,Object);
doc.setField(description, partialUpdate);
server.add(docs);
server.commit();

I am seeing the below description value with {set =...}, Any idea why this
is getting added?

str name=description
{set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip
for faster processing and longer battery life, the M8 motion coprocessor to
track speed, distance and elevation, and with an 8MP iSight camera, you can
record 1080p HD Video at 60 FPS!}
/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html
Sent from the Solr - User mailing list archive at Nabble.com.



DIH Blob data

2014-11-11 Thread Rahul
I am trying to index json data present under blob data type in data base.
JSON stored in database as {a:1,b:2,c:3}.

I want to Search based on fields later like fq= a:1.
The fields a,b,c are dynamic and can be anything based on data posted by
users.

What is the correct way to index data based on dynamic fields in Solr and
search them later based on those fields.

-- 

Rahul Ranjan