space issue in search results

2014-04-28 Thread PAVAN
I have indexed title in the following way.

honda cars in rajaji nagar
honda cars in rajajinagar.

suppose if i search for 

honda cars in rajainagar (OR) 
honda cars in rajaji nagar 

it has to display both the results.

Anybody help me how can we do this.








--
View this message in context: 
http://lucene.472066.n3.nabble.com/space-issue-in-search-results-tp4133421.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: space issue in search results

2014-04-28 Thread Gora Mohanty
On 28 April 2014 12:42, PAVAN pavans2...@gmail.com wrote:

 I have indexed title in the following way.

 honda cars in rajaji nagar
 honda cars in rajajinagar.

 suppose if i search for

 honda cars in rajainagar (OR)
 honda cars in rajaji nagar

 it has to display both the results.

Please do not start multiple threads with the same question.

The straightforward way to do what you want is to use synonyms:
  rajaji nagar, rajajinagar
as presumably you want to collapse spaces only for things like
place names.

Regards,
Gora


merge shards indexes

2014-04-28 Thread Gastone Penzo
Hi,
it's possible to merge 2 shards indexes into one?

Thank you

-- 
*Gastone Penzo*


Re: Application of different stemmers / stopword lists within a single field

2014-04-28 Thread Manuel Le Normand
Why wouldn't you take advantage of your use case - the chars belong to
different char classes.

You can index this field to a single solr field (no copyField) and apply an
analysis chain that includes both languages analysis - stopword, stemmers
etc.
As every filter should apply to its' specific language (e.g an arabic
stemmer should not stem a lating word) you can make cross languages search
on this single field.


On Mon, Apr 28, 2014 at 5:59 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 If you can throw money at the problem:
 http://www.basistech.com/text-analytics/rosette/language-identifier/ .
 Language Boundary Locator at the bottom of the page seems to be
 part/all of your solution.

 Otherwise, specifically for English and Arabic, you could play with
 Unicode ranges to try detecting text blocks:
 1) Create an UpdateRequestProcessor chain that
 a) clones text into field_EN and field_AR.
 b) applies regular expression transformations that strip English or
 Arabic unicode text range correspondingly, so field_EN only has
 English characters left, etc. Of course, you need to decide what you
 want to do with occasional EN or neutral characters happening in the
 middle of Arabic text (numbers: Arabic or Indic? brackets, dashes,
 etc). But if you just index text, it might be ok even if it is not
 perfect.
 c) deletes empty fields, just in case not all of them have mix language
 2) Use eDismax to search over both fields, each with its own processor.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Fri, Apr 25, 2014 at 5:34 PM, Timothy Hill timothy.d.h...@gmail.com
 wrote:
  This may not be a practically solvable problem, but the company I work
 for
  has a large number of lengthy mixed-language documents - for example,
  scholarly articles about Islam written in English but containing lengthy
  passages of Arabic. Ideally, we would like users to be able to search
 both
  the English and Arabic portions of the text, using the full complement of
  language-processing tools such as stemming and stopword removal.
 
  The problem, of course, is that these two languages co-occur in the same
  field. Is there any way to apply different processing to different words
 or
  paragraphs within a single field through language detection? Is this to
 all
  intents and purposes impossible within Solr? Or is another approach
 (using
  language detection to split the single large field into
  language-differentiated smaller fields, for example)
 possible/recommended?
 
  Thanks,
 
  Tim Hill



Solr Cloud and Replication request handler

2014-04-28 Thread Amanjit Gill
Hi everybody,

Considering a solr cloud configuration (4.6+)

a) I am wondering if the solr replication handler always has to be
configured completely, aka by choosing one master, then setting the config
accordingly (enable, masterUrl) etc ...  Do we really need a replication
master?

solrconfig.xml excerpt

requestHandler name=/replication class=solr.ReplicationHandler
 lst name=master
 str name=enabletrue/str  !-- true on master instance --
[..]
   /lst
  lst name=slave
 str name=enablefalse/str !-- true on slave instance --
 str name=masterUrlhttp://mysolrinstance::port
/default/replication/str
[..]
   /lst
/requestHandler

b) what happens to the cloud if the master instance goes down?

Thanks for your info ...

All the best,
Amanjit


Re: Solr Cluster management having too many cores

2014-04-28 Thread Mukesh Jha
Thanks Erik,

Sounds about right.

BTW how long can I keep adding collections i.e. can I keep 5/10 years data
like this?

Also what do you think of bullet 2) of having collection specific
configurations in zookeeper?


On Fri, Apr 25, 2014 at 11:44 PM, Erick Erickson erickerick...@gmail.comwrote:

 So you're talking about 700 or so collections. That should be do-able,
 especially as Solr is rapidly evolving to handle more and more
 collections and there's two years for that to happen.

 The aging out bit is manual (well, you'd script it I suppose). So
 every day there'd be a script that ran and just knew the right
 collection to change the alias on, there's nothing automatic yet.

 Best,
 Erick

 On Fri, Apr 25, 2014 at 9:37 AM, Mukesh Jha me.mukesh@gmail.com
 wrote:
  Thanks for quick reply Erik,
 
  I want to keep my collections till I run out of hardware, which is at
 least
  a couple of years worth data.
  I'd like to know more on ageing out aliases, did a quick search but
 didn't
  find much.
 
 
  On Fri, Apr 25, 2014 at 9:45 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Hmmm, tell us a little more about your use-case. In particular, how
  long do you need to keep the data around? Days? Months? Years?
 
  Because if you only need to keep the data for a specified period, you
  can use the collection aliasing process to age-out collections and
  keep the number of cores from growing too large.
 
  Best,
  Erick
 
  On Fri, Apr 25, 2014 at 6:49 AM, Mukesh Jha me.mukesh@gmail.com
  wrote:
   Hi Experts,
  
   I need to divide my indexes based on hour/day with each index having
  ~50-80
   GB data  ~50-80 mill docs, so I'm planning to create daily collection
  with
   names e.g. *sample_colledction__mm_dd_hh.*
   I'll also create an alias *sample_collection* and update it whenever I
  will
   create a new collection so that the entire data set is searchable.
  
   I've a couple of question on the above design
   1) How far can it scale? As my collections will increase (so will the
   shards  replicas) do we have a breaking point when adding
 more/searching
   will become an issue?
   2) As my cluster will grow because of huge number of collections the
   clusterstate.json file present in zookeeper will grow too, won't this
 be
  a
   limiting factor? If so instead of storing all this info in one
   clusterstate.json file shouldn't Solr save cluster specific details in
  this
   file  have collection specific config files present on zookeeper?
   3) How can I easily manage all these collections? Do we have Java
  Coreadmin
   API's available. I cannot find much documented on it.
  
   --
   Txz,
  
   *Mukesh Jha me.mukesh@gmail.com*
 
 
 
 
  --
 
 
  Thanks  Regards,
 
  *Mukesh Jha me.mukesh@gmail.com*




-- 


Thanks  Regards,

*Mukesh Jha me.mukesh@gmail.com*


Re: merge shards indexes

2014-04-28 Thread Dmitry Kan
Yes, according to this documentation:
https://wiki.apache.org/solr/MergingSolrIndexes


On Mon, Apr 28, 2014 at 12:14 PM, Gastone Penzo gastone.pe...@gmail.comwrote:

 Hi,
 it's possible to merge 2 shards indexes into one?

 Thank you

 --
 *Gastone Penzo*




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: Solr Cloud and Replication request handler

2014-04-28 Thread Shawn Heisey
On 4/28/2014 3:33 AM, Amanjit Gill wrote:
 Hi everybody,
 
 Considering a solr cloud configuration (4.6+)
 
 a) I am wondering if the solr replication handler always has to be
 configured completely, aka by choosing one master, then setting the config
 accordingly (enable, masterUrl) etc ...  Do we really need a replication
 master?


You simply need the replication handler to be present with a name of
/replication for SolrCloud to work properly.  You do not need to
configure it for master or slave.  SolrCloud will take care of
configuring which instance needs to be a slave whenever it needs to
recover an index.  You literally just need one line in your solrconfig.xml:

  requestHandler name=/replication class=solr.ReplicationHandler /

Thanks,
Shawn



Re: Stemming not working with wildcard search

2014-04-28 Thread Geepalem
Can some one please help me with this as I am struck with this issue..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382p4133477.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wildcard search not working with search term having special characters and digits

2014-04-28 Thread Geepalem
Can some one please help me with this as I am struck with this issue.. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud and Replication request handler

2014-04-28 Thread Amanjit Gill
Hello Shawn,

Thanks for your reply, that's good news!

All the best.


2014-04-28 15:28 GMT+02:00 Shawn Heisey s...@elyograg.org:

 On 4/28/2014 3:33 AM, Amanjit Gill wrote:
  Hi everybody,
 
  Considering a solr cloud configuration (4.6+)
 
  a) I am wondering if the solr replication handler always has to be
  configured completely, aka by choosing one master, then setting the
 config
  accordingly (enable, masterUrl) etc ...  Do we really need a replication
  master?


 You simply need the replication handler to be present with a name of
 /replication for SolrCloud to work properly.  You do not need to
 configure it for master or slave.  SolrCloud will take care of
 configuring which instance needs to be a slave whenever it needs to
 recover an index.  You literally just need one line in your solrconfig.xml:

   requestHandler name=/replication class=solr.ReplicationHandler /

 Thanks,
 Shawn




Re: Solr Cluster management having too many cores

2014-04-28 Thread Shawn Heisey
On 4/28/2014 5:05 AM, Mukesh Jha wrote:
 Thanks Erik,
 
 Sounds about right.
 
 BTW how long can I keep adding collections i.e. can I keep 5/10 years data
 like this?
 
 Also what do you think of bullet 2) of having collection specific
 configurations in zookeeper?

Regarding bullet 2, there is work underway right now to create a
separate clusterstate within zookeeper for each collection.  I do not
know how far along that work is.

There are no hard limits in SolrCloud at all.  The things that will
cause issues with scalability are resource-related problems.  You'll
exceed the 1MB default limit on a zookeeper database pretty quickly.  If
you're not using the example jetty included with Solr, you'll exceed the
default maxThreads on most servlet containers very quickly.  You may run
into problems with the default limits on Solr's HttpShardHandler.

Running hundreds or thousands of cores efficiently will require lots of
RAM, both for the OS disk cache and the java heap.  A large java heap
will require significant tuning of Java garbage collection parameters.

Most operating systems limit a user to 1024 open files and 1024 running
processes (which includes threads).  These limits will need to be increased.

There may be other limits imposed by the Solr config, Java, and/or the
operating system that I have not thought of or stated here.

Thanks,
Shawn



Re: Wildcard search not working with search term having special characters and digits

2014-04-28 Thread Jack Krupansky
Wildcard query only works for single terms. Any embedded special characters 
will cause a term to be split into multiple terms at index time. The use of 
a wildcard in a query term with embedded special characters will bypass 
normal analysis - you need to enter the term exactly as it would be analyzed 
at index time for wildcard to work.


Ditto is your filed type uses the word delimiter filter with the split 
digits option enabled - the alpha and numeric portions will generate 
separate terms - and cause a wildcard to fail.


-- Jack Krupansky

-Original Message- 
From: Geepalem

Sent: Sunday, April 27, 2014 3:30 PM
To: solr-user@lucene.apache.org
Subject: Wildcard search not working with search term having special 
characters and digits


Hi,

Below query without wildcard search is returning results.
http://localhost:8080/solr/master/select?q=page_title_t:an-138;

But below query with wildcard is not returning results
http://localhost:8080/solr/master/select?q=page_title_t:an-13*;

Below query with wildcard search and no didgits  is returning results.
http://localhost:8080/solr/master/select?q=page_title_t:an-*;

I have tried by adding WordDelimeter Filter but there is no luck.
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/


Please suggest or guide how to make wildcard search works with special
characters and digits.

Appreciate immediate response!!

Thanks,
G. Naresh Kumar






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Stemming not working with wildcard search

2014-04-28 Thread Jack Krupansky
Wildcards and stemming are incompatible at query time - you need to manually 
stem the term before applying your wildcard.


Wildcards are not supported in quoted phrases. They will be treated as 
punctuation, and ignored by the standard tokenizer or the word delimiter 
filter.


-- Jack Krupansky

-Original Message- 
From: Geepalem

Sent: Sunday, April 27, 2014 3:13 PM
To: solr-user@lucene.apache.org
Subject: Stemming not working with wildcard search

Hi,

I have added  SnowballPorterFilterFactory filter to field type to make
singular and plural search terms return same results.

So below queries (double quotes around search term) returning similar
results which is fine.

http://localhost:8080/solr/master/select?q=page_title_t:product*;
http://localhost:8080/solr/master/select?q=page_title_t:products*;

But when I have analyzed results, in both result sets, documents which dont
start with words Product or products didnt come though there are few
documents available.

So I have added * as prefix and suffix to search term without double quotes
to do wildcard search.

http://localhost:8080/solr/master/select?q=page_title_t:*product*
http://localhost:8080/solr/master/select?q=page_title_t:*products*

Now, stemming is not working as above second query is not returning similar
results as query 1.

If double quotes are added around search term then its returning similar
results but results are not as expected. With double quotes it wont return
results like Old products, New products, Cool Product.
It will only return results with the values like Product 1, Product
2,Products of USA.

Please suggest or guide how to make stemming work with wildcard search.


Appreciate immediate response!!

Thanks,
G. Naresh Kumar





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382.html
Sent from the Solr - User mailing list archive at Nabble.com. 



how to write my first solr query

2014-04-28 Thread Evan Smith
Hello,

I would like to find all documents that have say foo bar with a filter to
remove any cases where foo bar is prefixed with things like cat, a,
...

I am ok with a document that has cat foo bar  and foo bar, but if it
only has cat foo bar then I don't want it while if it has foo bar I want
it.

I looked at span queries but was not able to come up with how to phrase
this.

Any pointers would be great!

Thank you in advance,
Evan




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-write-my-first-solr-query-tp4133509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stemming not working with wildcard search

2014-04-28 Thread Ahmet Arslan
Hi Naresh,

quotes are only meaningful when there are two or more terms. don't use quotes 
for products* and product*.

As regarding stemming and wildcards, use following chain, and your wildcard 
searches will be happier.

filter class=solr.KeywordRepeatFilterFactory/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/

Ahmet


On Monday, April 28, 2014 5:41 PM, Jack Krupansky j...@basetechnology.com 
wrote:
Wildcards and stemming are incompatible at query time - you need to manually 
stem the term before applying your wildcard.

Wildcards are not supported in quoted phrases. They will be treated as 
punctuation, and ignored by the standard tokenizer or the word delimiter 
filter.

-- Jack Krupansky

-Original Message- 
From: Geepalem
Sent: Sunday, April 27, 2014 3:13 PM
To: solr-user@lucene.apache.org
Subject: Stemming not working with wildcard search

Hi,

I have added  SnowballPorterFilterFactory filter to field type to make
singular and plural search terms return same results.

So below queries (double quotes around search term) returning similar
results which is fine.

http://localhost:8080/solr/master/select?q=page_title_t:product*;
http://localhost:8080/solr/master/select?q=page_title_t:products*;

But when I have analyzed results, in both result sets, documents which dont
start with words Product or products didnt come though there are few
documents available.

So I have added * as prefix and suffix to search term without double quotes
to do wildcard search.

http://localhost:8080/solr/master/select?q=page_title_t:*product*
http://localhost:8080/solr/master/select?q=page_title_t:*products*

Now, stemming is not working as above second query is not returning similar
results as query 1.

If double quotes are added around search term then its returning similar
results but results are not as expected. With double quotes it wont return
results like Old products, New products, Cool Product.
It will only return results with the values like Product 1, Product
2,Products of USA.

Please suggest or guide how to make stemming work with wildcard search.


Appreciate immediate response!!

Thanks,
G. Naresh Kumar





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to write my first solr query

2014-04-28 Thread Ahmet Arslan


Hi Evan,

Confusing use case :)

You don't want foo bar is prefixed with cat ?

But you are ok with a document that has cat foo bar

Isn't this contradiction?




On Monday, April 28, 2014 6:26 PM, Evan Smith e...@wingonwing.com wrote:
Hello,

I would like to find all documents that have say foo bar with a filter to
remove any cases where foo bar is prefixed with things like cat, a,
...

I am ok with a document that has cat foo bar  and foo bar, but if it
only has cat foo bar then I don't want it while if it has foo bar I want
it.

I looked at span queries but was not able to come up with how to phrase
this.

Any pointers would be great!

Thank you in advance,
Evan




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-write-my-first-solr-query-tp4133509.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANNOUNCE] Apache Solr 4.8.0 released

2014-04-28 Thread Uwe Schindler
28 April 2014, Apache Solr™ 4.8.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.8.0

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.8.0 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Solr 4.8.0 Release Highlights:

* Apache Solr now requires Java 7 or greater (recommended is
  Oracle Java 7 or OpenJDK 7, minimum update 55; earlier versions
  have known JVM bugs affecting Solr).

* Apache Solr is fully compatible with Java 8.

* fields and types tags have been deprecated from schema.xml.
  There is no longer any reason to keep them in the schema file,
  they may be safely removed. This allows intermixing of fieldType,
  field and copyField definitions if desired.

* The new {!complexphrase} query parser supports wildcards, ORs etc.
  inside Phrase Queries. 

* New Collections API CLUSTERSTATUS action reports the status of
  collections, shards, and replicas, and also lists collection
  aliases and cluster properties.
 
* Added managed synonym and stopword filter factories, which enable
  synonym and stopword lists to be dynamically managed via REST API.

* JSON updates now support nested child documents, enabling {!child}
  and {!parent} block join queries. 

* Added ExpandComponent to expand results collapsed by the
  CollapsingQParserPlugin, as well as the parent/child relationship
  of nested child documents.

* Long-running Collections API tasks can now be executed
  asynchronously; the new REQUESTSTATUS action provides status.

* Added a hl.qparser parameter to allow you to define a query parser
  for hl.q highlight queries.

* In Solr single-node mode, cores can now be created using named
  configsets.

* New DocExpirationUpdateProcessorFactory supports computing an
  expiration date for documents from the TTL expression, as well as
  automatically deleting expired documents on a periodic basis. 

Solr 4.8.0 also includes many other new features as well as numerous
optimizations and bugfixes of the corresponding Apache Lucene release.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Chair / Committer
Bremen, Germany
http://lucene.apache.org/




Re: SpanQuery with Boolean Queries

2014-04-28 Thread Vijay Kokatnur
Pretty neat. Thanks!


On Fri, Apr 25, 2014 at 2:44 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 I am not sure how OR clauses are executed.

 But after re-reading your mail, I think you can use SpanOrQuery (for your
 q1) in your custom query parser plugin.

 val q2 = new SpanOrQuery(
 new SpanTermQuery(new Term(BookingRecordId,
 ID_1)),
 new SpanTermQuery(new Term(BookingRecordId,
 ID_N))
 );




 On Friday, April 25, 2014 3:22 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Thanks Ahmet. It worked!

 Does solr execute these nested queries in parallel?



 On Thu, Apr 24, 2014 at 12:53 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Vijay,
 
  May be you can use _query_ hook?
 
  _query_:{!span}BookingRecordId:234 OrderLineType:11 OR _query_:{!span}
  OrderLineType:13 + BookingRecordId:ID_N
 
  Ahmet
 
 
  On Thursday, April 24, 2014 9:34 PM, Vijay Kokatnur 
  kokatnur.vi...@gmail.com wrote:
  Hi,
 
  I have defined a SpanQuery for proximity search like -
 
  val q1 = new SpanTermQuery(new Term(BookingRecordId, 234))
  val q2 = new SpanTermQuery(new Term(OrderLineType, 11))
  val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
  val sp = Array[SpanQuery](q1, q2m)
 
  val q = new SpanNearQuery(sp, -1, false)
 
  Query:
  *fq={!span} BookingRecordId: 234+OrderLineType11*
 
  However, I need to look up by multiple BookingRecordIds with an OR -
 
  *fq={!span}OrderLineType:13 + (BookingRecordId:ID_1 OR ... OR
  BookingRecordId:ID_N)*
 
  I can't specify multiple *span* in the same query like -
 
  *{!span} OrderLineType:13 + BookingRecordId:ID_1 OR ... OR {!span}
  OrderLineType:13 + BookingRecordId:ID_N*
 
  Is there any recommended to way to achieve this?
  Thanks, Vijay
 
 




Delete fields from document using a wildcard

2014-04-28 Thread Costi Muraru
Hi guys,

Would be possible, using Atomic Updates in SOLR4, to remove all fields
matching a pattern? For instance something like:

adddoc
  field name=id100/field
  *field name=*_name_i update=set null=true/field*
/doc/add

Or something similar to remove certain fields in all documents.

Thanks,
Costi


Re: how to write my first solr query

2014-04-28 Thread Evan Smith
Hello,

Here is a better use case

Documents A, B, C, and D

A: dear foo bar hello
B: dear cat foo bar hello
C: dear cat foo bar hello foo bar
D: dear car foo bar

I have a dictionary of items outside of solr 
foo bar and cat foo bar
And associated with each item is the set of suffix's of that item
So I know that foo bar has cat foo bar as a suffix

I would like to search my corpus of documents A, B, C and D
And just get documents that contain foo bar and not the ones that contain
cat foo bar

So if I searched on foo bar but not cat foo bar
I want to get documents A, C, D
But not B which does not have just foo bar but has cat foo bar.
I am ok with C as it has a foo bar that is not prefixed with cat.

Does this make sense?  I see that the (foo bar and not cat foo bar)
would not work as it would miss document C.  Or at least I think it would.

Evan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-write-my-first-solr-query-tp4133509p4133537.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wildcard search not working with search term having special characters and digits

2014-04-28 Thread Geepalem
Thanks jack for prompt response!

So is there any solution to make this scenario works? 
Or wildcard doesn't work with special characters and numerics?

Thanks,
G. Naresh Kumar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133554.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: zkCli zkhost parameter

2014-04-28 Thread Scott Stults
I did, but it looks like I mixed in the chroot too after every entry rather
than once at the very end (thanks to David Smiley for catching that). I'll
try again and update if it's still a problem.

Thanks!
-Scott




On Sat, Apr 26, 2014 at 1:08 PM, Mark Miller markrmil...@gmail.com wrote:

 Have you tried a comma-separated list or are you going by documentation?
 It should work.
 --
 Mark Miller
 about.me/markrmiller

 On April 26, 2014 at 1:03:25 PM, Scott Stults (
 sstu...@opensourceconnections.com) wrote:

 It looks like this only takes a single host as its value, whereas the
 zkHost environment variable for Solr takes a comma-separated list.
 Shouldn't the client also take a comma-separated list?

 k/r,
 Scott




-- 
Scott Stults | Founder  Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: Stemming not working with wildcard search

2014-04-28 Thread Geepalem
Hi Ahmet,

Thanks for your prompt response!

I have added filters which you have specified but still its not working.
Below is field Query Analyzer

 analyzer type=query
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.LowerCaseFilterFactory / 

filter class=solr.KeywordRepeatFilterFactory/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer

http://localhost:8080/solr/master/select?q=page_title_t:*products*
http://localhost:8080/solr/master/select?q=page_title_t:*product*


Please let me know if I am doing anything wrong.

Thanks,
G. Naresh Kumar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382p4133556.html
Sent from the Solr - User mailing list archive at Nabble.com.


saving user actions on item in solr for later retrieval

2014-04-28 Thread nolim
Hi,
We are using solr in production system for around ~500 users and we have
around ~1 queries per day.
Our user's search topics most of the time static and repeat themselves over
time. 

We have in our system an option to specify specific search subject (we
also call it specific information need) and most of our users are using
this option.
We keep in our system logs each query and document retrieved from each
information need
and the user can also give feedback if the document is relevant for his
information need.

We also have special query expansion technique and diversity algorithm based
on MMR.

We want to use this information from logs as data set for training our
ranking system
and preforming Learning To Rank for each information need or cluster of
information needs.
We also want to give the user the option filter by relevant and read
based on his actions\friends actions in the same topic.
When he runs a query again or similar one he can skip already read
documents. That's an important requirement to our users.

We think about 2 possibilities to implement it:
1. Updating each item in solr and creating 2 fields named: read,
relevant.
Each field is multivalue field with the corresponding label of the
information need.
When the user reads a document an update is sent to solr and the field
read gets a label with
the information need the user is working on...
Will cause update when each item is read by user (still nothing compare to
new items coming in each day).
We are saving information that belongs to the application in solr which
may be wrong architecture.

2. Save the information In DB, and then preforming filtering on the
retrieved results.
this option is much more complicated (We now have fields that aren't solr
and the user uses them for search). We won't get facets, autocomplete and
other nice stuff that a regular field in solr can have.
cost in preformances, we can''t retrieve easy: give me top 10 documents
that answer the query and unread from the information need and more
complicated code to hold.

3. Do you have more ideas?

Which of those options is the better?

Thanks in advance!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558.html
Sent from the Solr - User mailing list archive at Nabble.com.


Issue with SpanQuery

2014-04-28 Thread Vijay Kokatnur
I have been working on SpanQuery for some time now to look up multivalued
fields and found one more issue  -

Now if a document has following lookup fields among others

*BookingRecordId*: [ 100268421, 190131, 8263325 ],

*OrderLineType*: [ 13, 1, 11 ],

Here is the query I construct -

val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
val sp = Array[SpanQuery](q1, q2m)

val q = new SpanNearQuery(sp, -1, false)

Query to find element at first index position works fine -

*{!span} BookingRecordId: 100268421 +OrderLineType:13*
but query to find element at third index position doesn't return any
result. -

*{!span} BookingRecordId: 8263325 +OrderLineType:11 *

If I increase the slope to 4 then it returns correct result. But it also
matches BookingRecordId: 100268421 with OrderLineType:11 which is incorrect.

I thought SpanQuery works for any multiValued field size.  Any ideas how I
can fix this?

Thanks,
-Vijay


spellcheck.q and local parameters

2014-04-28 Thread Jeroen Steggink
Hi,

I'm having some trouble using the spellcheck.q parameter. The user's query is 
defined in the qq parameter and q parameter contains several other parameters 
for boosting.
I would like to use the qq parameter as a default for spellcheck.q.
I tried several ways of adding the qq parameter in the spellcheck.q parameter, 
but it doesn't seem to work. Is this at all possible or do I need to write a 
custom QueryConverter?

This is the configuration:

str name=q _query_:{!edismax qf=$qfQuery pf=$pfQuery bq=$boostQuery 
bf=$boostFunction v=$qq}/str
str name=spellcheck.q{!v=$qq}/str

I haven't included all the variables, because they seem unnecessary.

Regards,
Jeroen


RE: how to write my first solr query

2014-04-28 Thread Jeroen Steggink
Hi Evan,

If I understand correctly, a document has to have at least one foo bar 
without having cat in front.

A solution would be to use a combination of the ShingleFilterFactory and query 
for one occurences of foo bar using the termfreq function.

https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-ShingleFilter
https://cwiki.apache.org/confluence/display/solr/Function+Queries

The number of shingles depends on how many terms are in the query and how many 
terms cannot be prefixed.

It might be easier to just retrieve all the documents which contain the phrase 
and process the results outside of Solr.
If you could shed some more light on what you are trying to accomplish, maybe 
we can help you find an even better solution to fit your problem.

Jeroen

-Original Message-
From: Evan Smith [mailto:e...@wingonwing.com] 
Sent: maandag 28 april 2014 19:20
To: solr-user@lucene.apache.org
Subject: Re: how to write my first solr query

Hello,

Here is a better use case

Documents A, B, C, and D

A: dear foo bar hello
B: dear cat foo bar hello
C: dear cat foo bar hello foo bar
D: dear car foo bar

I have a dictionary of items outside of solr foo bar and cat foo bar
And associated with each item is the set of suffix's of that item
So I know that foo bar has cat foo bar as a suffix

I would like to search my corpus of documents A, B, C and D And just get 
documents that contain foo bar and not the ones that contain cat foo bar

So if I searched on foo bar but not cat foo bar
I want to get documents A, C, D
But not B which does not have just foo bar but has cat foo bar.
I am ok with C as it has a foo bar that is not prefixed with cat.

Does this make sense?  I see that the (foo bar and not cat foo bar) would 
not work as it would miss document C.  Or at least I think it would.

Evan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-write-my-first-solr-query-tp4133509p4133537.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: spellcheck.q and local parameters

2014-04-28 Thread Dyer, James
spellcheck.q is supposed to take a list of raw query terms, so what you're 
trying to do in your example won't work.  What you should do instead is 
space-delimit the actual query terms that exist in qq and (nothing else) use 
that for your value of spellcheck.q .  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Jeroen Steggink [mailto:jeroen.stegg...@contentstrategy.nl] 
Sent: Monday, April 28, 2014 3:01 PM
To: solr-user@lucene.apache.org
Subject: spellcheck.q and local parameters

Hi,

I'm having some trouble using the spellcheck.q parameter. The user's query is 
defined in the qq parameter and q parameter contains several other parameters 
for boosting.
I would like to use the qq parameter as a default for spellcheck.q.
I tried several ways of adding the qq parameter in the spellcheck.q parameter, 
but it doesn't seem to work. Is this at all possible or do I need to write a 
custom QueryConverter?

This is the configuration:

str name=q _query_:{!edismax qf=$qfQuery pf=$pfQuery bq=$boostQuery 
bf=$boostFunction v=$qq}/str
str name=spellcheck.q{!v=$qq}/str

I haven't included all the variables, because they seem unnecessary.

Regards,
Jeroen



RE: spellcheck.q and local parameters

2014-04-28 Thread Jeroen Steggink
Thanks James, I was afraid of that. The problem is that spellcheck.q Is not 
always provided by the users and therefore it gives wrong suggestions. I'll 
just turn off spellcheck by default.

Cheers,
Jeroen

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: maandag 28 april 2014 22:55
To: solr-user@lucene.apache.org
Subject: RE: spellcheck.q and local parameters

spellcheck.q is supposed to take a list of raw query terms, so what you're 
trying to do in your example won't work.  What you should do instead is 
space-delimit the actual query terms that exist in qq and (nothing else) use 
that for your value of spellcheck.q .  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Jeroen Steggink [mailto:jeroen.stegg...@contentstrategy.nl]
Sent: Monday, April 28, 2014 3:01 PM
To: solr-user@lucene.apache.org
Subject: spellcheck.q and local parameters

Hi,

I'm having some trouble using the spellcheck.q parameter. The user's query is 
defined in the qq parameter and q parameter contains several other parameters 
for boosting.
I would like to use the qq parameter as a default for spellcheck.q.
I tried several ways of adding the qq parameter in the spellcheck.q parameter, 
but it doesn't seem to work. Is this at all possible or do I need to write a 
custom QueryConverter?

This is the configuration:

str name=q _query_:{!edismax qf=$qfQuery pf=$pfQuery bq=$boostQuery 
bf=$boostFunction v=$qq}/str str name=spellcheck.q{!v=$qq}/str

I haven't included all the variables, because they seem unnecessary.

Regards,
Jeroen



Re: Issue with SpanQuery

2014-04-28 Thread Ethan
Facing the same problem!! I have noticed it works fine as long as you're
looking up the first index position.

Anyone faced similar problem before?


On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
kokatnur.vi...@gmail.comwrote:

 I have been working on SpanQuery for some time now to look up multivalued
 fields and found one more issue  -

 Now if a document has following lookup fields among others

 *BookingRecordId*: [ 100268421, 190131, 8263325 ],

 *OrderLineType*: [ 13, 1, 11 ],

 Here is the query I construct -

 val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
 val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
 val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
 val sp = Array[SpanQuery](q1, q2m)

 val q = new SpanNearQuery(sp, -1, false)

 Query to find element at first index position works fine -

 *{!span} BookingRecordId: 100268421 +OrderLineType:13*
 but query to find element at third index position doesn't return any
 result. -

 *{!span} BookingRecordId: 8263325 +OrderLineType:11 *

 If I increase the slope to 4 then it returns correct result. But it also
 matches BookingRecordId: 100268421 with OrderLineType:11 which is incorrect.

 I thought SpanQuery works for any multiValued field size.  Any ideas how I
 can fix this?

 Thanks,
 -Vijay



Re: How to get a list of currently executing queries?

2014-04-28 Thread Otis Gospodnetic
No, though one could write a custom SearchComponent, I imagine.  Not
terribly useful for most situations where queries typically run for only a
few milliseconds, but

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Thu, Apr 17, 2014 at 7:34 AM, Nikhil Chhaochharia nikhil...@yahoo.comwrote:

 Hello,

 Is there some way of getting a list of all queries that are currently
 executing?  Something similar to 'show full processlist' in MySQL.

 Thanks,
 Nikhil


Re: Issue with SpanQuery

2014-04-28 Thread Ahmet Arslan
Hi,

Can you paste your field definition of BookingRecordId and OrderLineType? It 
could be something related to positionIncrementGap.

Ahmet



On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote:
Facing the same problem!! I have noticed it works fine as long as you're
looking up the first index position.

Anyone faced similar problem before?



On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
kokatnur.vi...@gmail.comwrote:

 I have been working on SpanQuery for some time now to look up multivalued
 fields and found one more issue  -

 Now if a document has following lookup fields among others

 *BookingRecordId*: [ 100268421, 190131, 8263325 ],

 *OrderLineType*: [ 13, 1, 11 ],

 Here is the query I construct -

 val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
 val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
 val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
 val sp = Array[SpanQuery](q1, q2m)

 val q = new SpanNearQuery(sp, -1, false)

 Query to find element at first index position works fine -

 *{!span} BookingRecordId: 100268421 +OrderLineType:13*
 but query to find element at third index position doesn't return any
 result. -

 *{!span} BookingRecordId: 8263325 +OrderLineType:11 *

 If I increase the slope to 4 then it returns correct result. But it also
 matches BookingRecordId: 100268421 with OrderLineType:11 which is incorrect.

 I thought SpanQuery works for any multiValued field size.  Any ideas how I
 can fix this?

 Thanks,
 -Vijay




RE: how to write my first solr query

2014-04-28 Thread Evan Smith
Hello,

Thank you!  I will try out what you suggested and post back once I know
more.

yes given things like
cat foo bar
house foo bar
foo bar

I want to know when the term foo bar (but not the prefix cases I specify)
exists in my documents.

Thanks!
Evan




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-write-my-first-solr-query-tp4133509p4133601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue with SpanQuery

2014-04-28 Thread Vijay Kokatnur
Hey Ehmet,

Here is the field def -

field name=BookingRecordId type=token indexed=true stored=true
multiValued=true omitTermFreqAndPositions=false/

fieldType name=token class=solr.TextField omitNorms=true analyzer
tokenizer class=solr.KeywordTokenizerFactory/ filter
class=solr.LowerCaseFilterFactory/ /analyzer /fieldType




On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Can you paste your field definition of BookingRecordId and OrderLineType?
 It could be something related to positionIncrementGap.

 Ahmet



 On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote:
 Facing the same problem!! I have noticed it works fine as long as you're
 looking up the first index position.

 Anyone faced similar problem before?



 On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
 kokatnur.vi...@gmail.comwrote:

  I have been working on SpanQuery for some time now to look up multivalued
  fields and found one more issue  -
 
  Now if a document has following lookup fields among others
 
  *BookingRecordId*: [ 100268421, 190131, 8263325 ],
 
  *OrderLineType*: [ 13, 1, 11 ],
 
  Here is the query I construct -
 
  val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
  val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
  val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
  val sp = Array[SpanQuery](q1, q2m)
 
  val q = new SpanNearQuery(sp, -1, false)
 
  Query to find element at first index position works fine -
 
  *{!span} BookingRecordId: 100268421 +OrderLineType:13*
  but query to find element at third index position doesn't return any
  result. -
 
  *{!span} BookingRecordId: 8263325 +OrderLineType:11 *
 
  If I increase the slope to 4 then it returns correct result. But it also
  matches BookingRecordId: 100268421 with OrderLineType:11 which is
 incorrect.
 
  I thought SpanQuery works for any multiValued field size.  Any ideas how
 I
  can fix this?
 
  Thanks,
  -Vijay
 




Re: Issue with SpanQuery

2014-04-28 Thread Ahmet Arslan
Hi,

I would add positionIncrementGap to fieldType definitions and experiment with 
different values. 0, 1 and 100.


fieldType name=token class=solr.TextField omitNorms=true 
positionIncrementGap=1

Same with OrderLineType too




On Tuesday, April 29, 2014 1:25 AM, Vijay Kokatnur kokatnur.vi...@gmail.com 
wrote:
Hey Ehmet,

Here is the field def -

field name=BookingRecordId type=token indexed=true stored=true
multiValued=true omitTermFreqAndPositions=false/

fieldType name=token class=solr.TextField omitNorms=true analyzer
tokenizer class=solr.KeywordTokenizerFactory/ filter
class=solr.LowerCaseFilterFactory/ /analyzer /fieldType





On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Can you paste your field definition of BookingRecordId and OrderLineType?
 It could be something related to positionIncrementGap.

 Ahmet



 On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote:
 Facing the same problem!! I have noticed it works fine as long as you're
 looking up the first index position.

 Anyone faced similar problem before?



 On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
 kokatnur.vi...@gmail.comwrote:

  I have been working on SpanQuery for some time now to look up multivalued
  fields and found one more issue  -
 
  Now if a document has following lookup fields among others
 
  *BookingRecordId*: [ 100268421, 190131, 8263325 ],
 
  *OrderLineType*: [ 13, 1, 11 ],
 
  Here is the query I construct -
 
  val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
  val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
  val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
  val sp = Array[SpanQuery](q1, q2m)
 
  val q = new SpanNearQuery(sp, -1, false)
 
  Query to find element at first index position works fine -
 
  *{!span} BookingRecordId: 100268421 +OrderLineType:13*
  but query to find element at third index position doesn't return any
  result. -
 
  *{!span} BookingRecordId: 8263325 +OrderLineType:11 *
 
  If I increase the slope to 4 then it returns correct result. But it also
  matches BookingRecordId: 100268421 with OrderLineType:11 which is
 incorrect.
 
  I thought SpanQuery works for any multiValued field size.  Any ideas how
 I
  can fix this?
 
  Thanks,
  -Vijay
 




Re: Issue with SpanQuery

2014-04-28 Thread Vijay Kokatnur
Thanks Ahmet, I'll give that a try.  Do I need to re-index to add/update
positionIncrementGap?


On Mon, Apr 28, 2014 at 3:31 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 I would add positionIncrementGap to fieldType definitions and experiment
 with different values. 0, 1 and 100.


 fieldType name=token class=solr.TextField omitNorms=true
 positionIncrementGap=1

 Same with OrderLineType too




 On Tuesday, April 29, 2014 1:25 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Hey Ehmet,

 Here is the field def -

 field name=BookingRecordId type=token indexed=true stored=true
 multiValued=true omitTermFreqAndPositions=false/

 fieldType name=token class=solr.TextField omitNorms=true analyzer
 tokenizer class=solr.KeywordTokenizerFactory/ filter
 class=solr.LowerCaseFilterFactory/ /analyzer /fieldType





 On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  Can you paste your field definition of BookingRecordId and OrderLineType?
  It could be something related to positionIncrementGap.
 
  Ahmet
 
 
 
  On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote:
  Facing the same problem!! I have noticed it works fine as long as you're
  looking up the first index position.
 
  Anyone faced similar problem before?
 
 
 
  On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
  kokatnur.vi...@gmail.comwrote:
 
   I have been working on SpanQuery for some time now to look up
 multivalued
   fields and found one more issue  -
  
   Now if a document has following lookup fields among others
  
   *BookingRecordId*: [ 100268421, 190131, 8263325 ],
  
   *OrderLineType*: [ 13, 1, 11 ],
  
   Here is the query I construct -
  
   val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
   val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
   val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
   val sp = Array[SpanQuery](q1, q2m)
  
   val q = new SpanNearQuery(sp, -1, false)
  
   Query to find element at first index position works fine -
  
   *{!span} BookingRecordId: 100268421 +OrderLineType:13*
   but query to find element at third index position doesn't return any
   result. -
  
   *{!span} BookingRecordId: 8263325 +OrderLineType:11 *
  
   If I increase the slope to 4 then it returns correct result. But it
 also
   matches BookingRecordId: 100268421 with OrderLineType:11 which is
  incorrect.
  
   I thought SpanQuery works for any multiValued field size.  Any ideas
 how
  I
   can fix this?
  
   Thanks,
   -Vijay
  
 
 



Re: Issue with SpanQuery

2014-04-28 Thread Ethan
I tried testing with positionIncrementGap but that didn't work.  The values
I passed for it were 0, 1, 4,100.

Reindexing also didn't help.


On Mon, Apr 28, 2014 at 3:34 PM, Vijay Kokatnur kokatnur.vi...@gmail.comwrote:

 Thanks Ahmet, I'll give that a try.  Do I need to re-index to add/update
 positionIncrementGap?


 On Mon, Apr 28, 2014 at 3:31 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 I would add positionIncrementGap to fieldType definitions and experiment
 with different values. 0, 1 and 100.


 fieldType name=token class=solr.TextField omitNorms=true
 positionIncrementGap=1

 Same with OrderLineType too




 On Tuesday, April 29, 2014 1:25 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Hey Ehmet,

 Here is the field def -

 field name=BookingRecordId type=token indexed=true stored=true
 multiValued=true omitTermFreqAndPositions=false/

 fieldType name=token class=solr.TextField omitNorms=true
 analyzer
 tokenizer class=solr.KeywordTokenizerFactory/ filter
 class=solr.LowerCaseFilterFactory/ /analyzer /fieldType





 On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  Can you paste your field definition of BookingRecordId and
 OrderLineType?
  It could be something related to positionIncrementGap.
 
  Ahmet
 
 
 
  On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote:
  Facing the same problem!! I have noticed it works fine as long as you're
  looking up the first index position.
 
  Anyone faced similar problem before?
 
 
 
  On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
  kokatnur.vi...@gmail.comwrote:
 
   I have been working on SpanQuery for some time now to look up
 multivalued
   fields and found one more issue  -
  
   Now if a document has following lookup fields among others
  
   *BookingRecordId*: [ 100268421, 190131, 8263325 ],
  
   *OrderLineType*: [ 13, 1, 11 ],
  
   Here is the query I construct -
  
   val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
   val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
   val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
   val sp = Array[SpanQuery](q1, q2m)
  
   val q = new SpanNearQuery(sp, -1, false)
  
   Query to find element at first index position works fine -
  
   *{!span} BookingRecordId: 100268421 +OrderLineType:13*
   but query to find element at third index position doesn't return any
   result. -
  
   *{!span} BookingRecordId: 8263325 +OrderLineType:11 *
  
   If I increase the slope to 4 then it returns correct result. But it
 also
   matches BookingRecordId: 100268421 with OrderLineType:11 which is
  incorrect.
  
   I thought SpanQuery works for any multiValued field size.  Any ideas
 how
  I
   can fix this?
  
   Thanks,
   -Vijay
  
 
 





Re: Issue with SpanQuery

2014-04-28 Thread Ahmet Arslan


Hi Vijay,

It is a index time setting so yes solr restart and re-indexing is required. So 
A small test case would be handy




On Tuesday, April 29, 2014 1:35 AM, Vijay Kokatnur kokatnur.vi...@gmail.com 
wrote:
Thanks Ahmet, I'll give that a try.  Do I need to re-index to add/update
positionIncrementGap?



On Mon, Apr 28, 2014 at 3:31 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 I would add positionIncrementGap to fieldType definitions and experiment
 with different values. 0, 1 and 100.


 fieldType name=token class=solr.TextField omitNorms=true
 positionIncrementGap=1

 Same with OrderLineType too




 On Tuesday, April 29, 2014 1:25 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Hey Ehmet,

 Here is the field def -

 field name=BookingRecordId type=token indexed=true stored=true
 multiValued=true omitTermFreqAndPositions=false/

 fieldType name=token class=solr.TextField omitNorms=true analyzer
 tokenizer class=solr.KeywordTokenizerFactory/ filter
 class=solr.LowerCaseFilterFactory/ /analyzer /fieldType





 On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  Can you paste your field definition of BookingRecordId and OrderLineType?
  It could be something related to positionIncrementGap.
 
  Ahmet
 
 
 
  On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote:
  Facing the same problem!! I have noticed it works fine as long as you're
  looking up the first index position.
 
  Anyone faced similar problem before?
 
 
 
  On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
  kokatnur.vi...@gmail.comwrote:
 
   I have been working on SpanQuery for some time now to look up
 multivalued
   fields and found one more issue  -
  
   Now if a document has following lookup fields among others
  
   *BookingRecordId*: [ 100268421, 190131, 8263325 ],
  
   *OrderLineType*: [ 13, 1, 11 ],
  
   Here is the query I construct -
  
   val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
   val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
   val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
   val sp = Array[SpanQuery](q1, q2m)
  
   val q = new SpanNearQuery(sp, -1, false)
  
   Query to find element at first index position works fine -
  
   *{!span} BookingRecordId: 100268421 +OrderLineType:13*
   but query to find element at third index position doesn't return any
   result. -
  
   *{!span} BookingRecordId: 8263325 +OrderLineType:11 *
  
   If I increase the slope to 4 then it returns correct result. But it
 also
   matches BookingRecordId: 100268421 with OrderLineType:11 which is
  incorrect.
  
   I thought SpanQuery works for any multiValued field size.  Any ideas
 how
  I
   can fix this?
  
   Thanks,
   -Vijay
  
 
 




Re: Issue with SpanQuery

2014-04-28 Thread Vijay Kokatnur
Adding positionIncrementGap=1 to the fields worked for me.  I didn't
re-index all the existing docs so it works for only future documents.


On Mon, Apr 28, 2014 at 3:54 PM, Ahmet Arslan iori...@yahoo.com wrote:



 Hi Vijay,

 It is a index time setting so yes solr restart and re-indexing is
 required. So A small test case would be handy




 On Tuesday, April 29, 2014 1:35 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Thanks Ahmet, I'll give that a try.  Do I need to re-index to add/update
 positionIncrementGap?



 On Mon, Apr 28, 2014 at 3:31 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  I would add positionIncrementGap to fieldType definitions and experiment
  with different values. 0, 1 and 100.
 
 
  fieldType name=token class=solr.TextField omitNorms=true
  positionIncrementGap=1
 
  Same with OrderLineType too
 
 
 
 
  On Tuesday, April 29, 2014 1:25 AM, Vijay Kokatnur 
  kokatnur.vi...@gmail.com wrote:
  Hey Ehmet,
 
  Here is the field def -
 
  field name=BookingRecordId type=token indexed=true stored=true
  multiValued=true omitTermFreqAndPositions=false/
 
  fieldType name=token class=solr.TextField omitNorms=true
 analyzer
  tokenizer class=solr.KeywordTokenizerFactory/ filter
  class=solr.LowerCaseFilterFactory/ /analyzer /fieldType
 
 
 
 
 
  On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
   Hi,
  
   Can you paste your field definition of BookingRecordId and
 OrderLineType?
   It could be something related to positionIncrementGap.
  
   Ahmet
  
  
  
   On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote:
   Facing the same problem!! I have noticed it works fine as long as
 you're
   looking up the first index position.
  
   Anyone faced similar problem before?
  
  
  
   On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
   kokatnur.vi...@gmail.comwrote:
  
I have been working on SpanQuery for some time now to look up
  multivalued
fields and found one more issue  -
   
Now if a document has following lookup fields among others
   
*BookingRecordId*: [ 100268421, 190131, 8263325 ],
   
*OrderLineType*: [ 13, 1, 11 ],
   
Here is the query I construct -
   
val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
val sp = Array[SpanQuery](q1, q2m)
   
val q = new SpanNearQuery(sp, -1, false)
   
Query to find element at first index position works fine -
   
*{!span} BookingRecordId: 100268421 +OrderLineType:13*
but query to find element at third index position doesn't return any
result. -
   
*{!span} BookingRecordId: 8263325 +OrderLineType:11 *
   
If I increase the slope to 4 then it returns correct result. But it
  also
matches BookingRecordId: 100268421 with OrderLineType:11 which is
   incorrect.
   
I thought SpanQuery works for any multiValued field size.  Any ideas
  how
   I
can fix this?
   
Thanks,
-Vijay
   
  
  
 




Indexing an array of maps get transformed to a map

2014-04-28 Thread Jinsu Oh
Our team is upgrading to solr 4.7.0 and running into an issue with
indexing an array of map objects in solr 4.7.0.

I understand that it makes no sense to index an array of map objects
to solr, but I want to figure out why certain error outputs are coming
out of the solr box.

So we have a document structure that goes something like:
{ id: 1234,
  url: abcd,
  modules: [ { id: 1, name: a} ]
}

When this goes through the solrj, I receive this error.

[http-bio-8080-exec-9] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:org.apache.solr.common.SolrException: Can't use
SignatureUpdateProcessor with partial update request containing
signature field: url

at 
org.apache.solr.update.processor.SignatureUpdateProcessorFactory$SignatureUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:159)

at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)

at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)

For some reason, when the SignatureUpdateProcessorFactory receives the
update command, the solr document has become:
{ id: 1234,
  url: abcd,
  modules: {id: 1, name: a}
}. Then the processor thinks I'm sending a partial update, when I'm
trying to index a full document. :/

When I trace the code, I can see that I'm creating SolrInputDocument
with key 'modules' and value '[ { id: 1, name: 1} ]'. But when I call
Solrj to add to solr, the document values are transformed...

Does anyone know why this is happening?

-- 
Jinsu Oh


Raw query parameters

2014-04-28 Thread Xavier Morera
Hi,

Would anyone be so kind to explain what are the Raw query parameters in
Solr's admin UI. I can't find an explanation in either the reference guide
nor wiki nor web search.

[image: Inline image 1]

A bit confused on what it actually is for
[image: Inline image 3]

Thanks in advance,
Xavier
-- 
*Xavier Morera*
email: xav...@familiamorera.com
CR: +(506) 8849 8866
US: +1 (305) 600 4919
skype: xmorera


Re: Selectively hiding SOLR facets.

2014-04-28 Thread atuldj.jadhav
Yes, but with my query *country:USA * it is returning me languages
belonging to countries other than USA.
 
Is there any way I can avoid such languages appearing in my facet filters?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Selectively-hiding-SOLR-facets-tp4132770p4133638.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: saving user actions on item in solr for later retrieval

2014-04-28 Thread Alexandre Rafalovitch
1. might be too expensive in terms of commits and performance of
refreshing the index every time.

3. Have you looked at external fields, custom components, etc. For example:
http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr
http://lucene.472066.n3.nabble.com/Combining-Solr-score-with-customized-user-ratings-for-a-document-td4040200.html
(past discussion that seems relevant)

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Apr 29, 2014 at 1:48 AM, nolim alony...@gmail.com wrote:
 Hi,
 We are using solr in production system for around ~500 users and we have
 around ~1 queries per day.
 Our user's search topics most of the time static and repeat themselves over
 time.

 We have in our system an option to specify specific search subject (we
 also call it specific information need) and most of our users are using
 this option.
 We keep in our system logs each query and document retrieved from each
 information need
 and the user can also give feedback if the document is relevant for his
 information need.

 We also have special query expansion technique and diversity algorithm based
 on MMR.

 We want to use this information from logs as data set for training our
 ranking system
 and preforming Learning To Rank for each information need or cluster of
 information needs.
 We also want to give the user the option filter by relevant and read
 based on his actions\friends actions in the same topic.
 When he runs a query again or similar one he can skip already read
 documents. That's an important requirement to our users.

 We think about 2 possibilities to implement it:
 1. Updating each item in solr and creating 2 fields named: read,
 relevant.
 Each field is multivalue field with the corresponding label of the
 information need.
 When the user reads a document an update is sent to solr and the field
 read gets a label with
 the information need the user is working on...
 Will cause update when each item is read by user (still nothing compare to
 new items coming in each day).
 We are saving information that belongs to the application in solr which
 may be wrong architecture.

 2. Save the information In DB, and then preforming filtering on the
 retrieved results.
 this option is much more complicated (We now have fields that aren't solr
 and the user uses them for search). We won't get facets, autocomplete and
 other nice stuff that a regular field in solr can have.
 cost in preformances, we can''t retrieve easy: give me top 10 documents
 that answer the query and unread from the information need and more
 complicated code to hold.

 3. Do you have more ideas?

 Which of those options is the better?

 Thanks in advance!



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delete fields from document using a wildcard

2014-04-28 Thread Alexandre Rafalovitch
Not out of the box, as far as I know.

Custom UpdateRequestProcessor could possibly do some sort of expansion
of the field name by verifying the actual schema. Not sure if API
supports that level of flexibility. Or, for latest Solr, you can
request the list of known field names via REST and do client-side
expansion instead.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Apr 29, 2014 at 12:20 AM, Costi Muraru costimur...@gmail.com wrote:
 Hi guys,

 Would be possible, using Atomic Updates in SOLR4, to remove all fields
 matching a pattern? For instance something like:

 adddoc
   field name=id100/field
   *field name=*_name_i update=set null=true/field*
 /doc/add

 Or something similar to remove certain fields in all documents.

 Thanks,
 Costi


Re: Raw query parameters

2014-04-28 Thread Shawn Heisey
On 4/28/2014 7:54 PM, Xavier Morera wrote:
 Would anyone be so kind to explain what are the Raw query parameters
 in Solr's admin UI. I can't find an explanation in either the reference
 guide nor wiki nor web search.

The query API supports a lot more parameters than are shown on the admin
UI.  For instance, If you are doing a faceted search, there are only
boxes for facet.query, facet.field, and facet.prefix ... but faceted
search supports a lot more parameters (like facet.method, facet.limit,
facet.mincount, facet.sort, etc).  Raw Query Parameters gives you a way
to use the entire query API, not just the few things that have UI input
boxes.

Thanks,
Shawn