RE: Workaround needed to sort on Multivalued fields indexed in SOLR

2012-05-17 Thread Bob Sandiford
How are you hoping that Sort will work on a multivalued field?  Normally, 
trying to do this makes no sense.

For example, if you have two authors for a document:
Smith, John
Jones, Joe

Then would you expect the document to sort under 'S' for Smith, or 'J' for 
Jones?  There's probably not a specific rule to choose one or the other, at 
least not in a generic sense.

If you wanted (for example) to be able to sort by the first author, then you 
could index just the first author in a separate, non-multivalued field, purely 
for the sort (while still having all the authors in your multivalued field)

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com
 
Join the conversation: Like us on Facebook! Follow us on Twitter!


-Original Message-
From: Anupam Bhattacharya [mailto:anupam...@gmail.com] 
Sent: Thursday, May 17, 2012 1:13 AM
To: solr-user@lucene.apache.org
Subject: Workaround needed to sort on Multivalued fields indexed in SOLR

I have indexed many documents which has a field for authors which is 
multivalued.

field name=authors type=string indexed=true stored=true multiValued 
=true/

How can I sort  order by on this kind of multivalued field ? Pls. suggest any 
workaround ?

Thanks
Anupam



RE: Does Solr fit my needs?

2012-04-27 Thread Bob Sandiford
Without speaking directly to the indexing and searching of the specific fields, 
it is certainly possible to retrieve the xml file.  While Solr isn't a DB, it 
does allow a binary field to be associated with an index document.  We store a 
GZipped XML file in a binary field and retrieve that under certain conditions 
to get at original document information.  We've found that Solr can handle 
these much faster than our DB can do.  (We regularly index a large portion of 
our documents, and the XML files are prone to frequent changes).  If you DO 
keep such a blob in your Solr index, make sure you retrieve that field ONLY 
when you really want it...

Now - if your XML files are relatively static (i.e. only change rarely, or only 
have new ones) then it still might make sense to use a real DB to store those, 
and just keep the primary key to the DB row in the Solr index.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

Register for the 2012 COSUGI User Group Conference today for early bird pricing!
May 2-5 at Disney's Coronado Springs Resort - Lake Buena Vista, Florida
 
Join the conversation: Like us on Facebook! Follow us on Twitter!


-Original Message-
From: G.Long [mailto:jde...@gmail.com] 
Sent: Friday, April 27, 2012 10:32 AM
To: solr-user@lucene.apache.org
Subject: Does Solr fit my needs?

Hi there :)

I'm looking for a way to save xml files into some sort of database and i'm 
wondering if Solr would fit my needs.
The xml files I want to save have a lot of child nodes which also contain child 
nodes with multiple values. The depth level can be more than 10.

After having indexed the files, I would like to be able to query for subparts 
of those xml files and be able to reconstruct them as xml files with all their 
children included. However, I'm wondering if it is possible with an index like 
solr lucene to keep or easily recover the structure of my xml data?

Thanks for your help,

Regards,

Gary




RE: UTF-8 encoding

2012-03-29 Thread Bob Sandiford
Hi, Henri.

Make sure that the container in which you are running Solr is also set for 
UTF-8.

For example, in Tomcat, in the server.xml file, your Connector definitions 
should include:
URIEncoding=UTF-8

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

Register for the 2012 COSUGI User Group Conference today for early bird pricing!
May 2-5 at Disney's Coronado Springs Resort - Lake Buena Vista, Florida
 
Join the conversation: Like us on Facebook! Follow us on Twitter!

 -Original Message-
 From: henri.gour...@laposte.net [mailto:henri.gour...@laposte.net]
 Sent: Thursday, March 29, 2012 10:42 AM
 To: solr-user@lucene.apache.org
 Subject: UTF-8 encoding
 
 I cant get utf-8 encoding to work!!
 
 I havestr name=v.contentTypetext/html;charset=UTF-8/str
 
 in my request handler, and
 input.encoding=UTF-8
 output.encoding=UTF-8
 in velocity.properties, in various locations (I may have the wrong ones! at
 least in the folder where the .vm files reside)
 
 What else should I be doing/configuring.
 
 Thanks
 Henri
 
 --
 View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-
 encoding-tp3867885p3867885.html
 Sent from the Solr - User mailing list archive at Nabble.com.




RE: Best Solr escaping?

2011-09-26 Thread Bob Sandiford
I won't guarantee this is the 'best algorithm', but here's what we use.  (This 
is in a final class with only static helper methods):

// Set of characters / Strings SOLR treats as having special meaning in a 
query, and the corresponding Escaped versions.
// Note that the actual operators '' and '||' don't show up here - we'll 
just escape the characters '' and '|' wherever they occur.
private static final String[] SOLR_SPECIAL_CHARACTERS = new String[] {+, 
-, , |, !, (, ), {, }, [, ], ^, \, ~, *, ?, 
:, \\};
private static final String[] SOLR_REPLACEMENT_CHARACTERS = new String[] 
{\\+, \\-, \\, \\|, \\!, \\(, \\), \\{, \\}, \\[, \\], 
\\^, \\\, \\~, \\*, \\?, \\:, };


/**
 * Escapes all special characters from the Search Terms, so they don't get 
confused with
 * the Solr query language special characters.
 * @param value - Search Term to escape
 * @return - escaped Search value, suitable for a Solr q parameter
 */
public static String escapeSolrCharacters(String value)
{
return StringUtils.replaceEach(value, SOLR_SPECIAL_CHARACTERS, 
SOLR_REPLACEMENT_CHARACTERS);
}

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

 -Original Message-
 From: Bill Bell [mailto:billnb...@gmail.com]
 Sent: Sunday, September 25, 2011 12:22 AM
 To: solr-user@lucene.apache.org
 Subject: Best Solr escaping?
 
 What is the best algorithm for escaping strings before sending to Solr?
 Does
 someone have some code?
 
 A few things I have witnessed in q using DIH handler
 * Double quotes -  that are not balanced can cause several issues from
 an
 error (strip the double quote?), to no results.
 * Should we use + or %20 ­ and what cases make sense:
  * Dr. Phil Smith or Dr.+Phil+Smith or Dr.%20Phil%20Smith - also
 what is
  the impact of double quotes?
 * Unmatched parenthesis I.e. Opening ( and not closing.
  * (Dr. Holstein
  * Cardiologist+(Dr. Holstein
 Regular encoding of strings does not always work for the whole string
 due to
 several issues like white space:
 * White space works better when we use back quote Bill\ Bell
 especially
 when using facets.
 
 Thoughts? Code? Ideas? Better Wikis?
 
 




RE: select query does not find indexed pdf document

2011-09-12 Thread Bob Sandiford
Um - looks like you specified your id value as pdfy, which is reflected in 
the results from the *:* query, but your id query is searching for vpn, 
hence no matches...

What does this query yield?

http://www/SearchApp/select/?q=id:pdfy

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

 -Original Message-
 From: Michael Dockery [mailto:dockeryjava...@yahoo.com]
 Sent: Monday, September 12, 2011 9:56 AM
 To: solr-user@lucene.apache.org
 Subject: Re: select query does not find indexed pdf document
 
 http://www/SearchApp/select/?q=id:vpn
 
 yeilds this:
   ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lstname=responseHeader
   intname=status0/int
   intname=QTime15/int
 - lstname=params
   strname=qid:vpn/str
   /lst
   /lst
   result name=responsenumFound=0start=0/
   /response
 
 
 *
 
  http://www/SearchApp/select/?q=*:*
 
 yeilds this:
 
   ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lstname=responseHeader
   intname=status0/int
   intname=QTime16/int
 - lstname=params
   strname=q*.*/str
   /lst
   /lst
 - resultname=responsenumFound=1start=0
 - doc
   strname=authordoc/str
 - arrname=content_type
   strapplication/pdf/str
   /arr
   strname=idpdfy/str
   datename=last_modified2011-05-20T02:08:48Z/date
 - arrname=title
   strdmvpndeploy.pdf/str
   /arr
   /doc
   /result
   /response
 
 
 From: Jan Høydahl jan@cominvent.com
 To: solr-user@lucene.apache.org; Michael Dockery
 dockeryjava...@yahoo.com
 Sent: Monday, September 12, 2011 4:59 AM
 Subject: Re: select query does not find indexed pdf document
 
 Hi,
 
 What do you get from a query http://www/SearchApp/select/?q=*:* or
 http://www/SearchApp/select/?q=id:vpn ?
 You may not have mapped the fields correctly to your schema?
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 On 12. sep. 2011, at 02:12, Michael Dockery wrote:
 
  I am new to solr.
 
  I tried to upload a pdf file via curl to my solr webapp (on tomcat)
 
  curl
 http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdfstream.co
 ntentType=application/pdfliteral.id=pdfycommit=true
 
 
 
  ?xml version=1.0 encoding=UTF-8?
  response
  lst name=responseHeaderint name=status0/intint
 name=QTime860/int/lst
  /response
 
 
  but
 
  http://www/SearchApp/select/?q=vpn
 
 
  does not find the document
 
 
  response
  lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
  lst name=params
  str name=qvpn/str
  /lst
  /lst
  result name=response numFound=0 start=0/
  /response
 
 
  help is appreciated.
 
  =
  fyi
  I point my test webapp to the index/solr home via mod meta-
 data/context.xml
  Context crossContext=true 
     Environment name=solr/home type=java.lang.String
   value=c:/solr_home override=true /
 
  and I had to copy all these jars to my webapp lib dir: (to avoid the
 classnotfound)
  Solr_download\contrib\extraction\lib
   ...in the future i plan to put them in the tomcat/lib dir.
 
 
  Also, I have not modified conf\solrconfig.xml or schema.xml.



RE: select query does not find indexed pdf document

2011-09-12 Thread Bob Sandiford
Hi, Michael.

Well, the stock answer is, 'it depends'

For example - would you want to be able to search filename without searching 
file contents, or would you always search both of them together?  If both, then 
copy both the file name and the parsed file content from the pdf into a single 
search field, and you can set that up as the default search field.

Or - what kind of processing / normalizing do you want on this data?  Case 
insensitive?  Accent insensitive?  If a 'word' contains camel case (e.g. 
TheVeryIdea), do you want that split on the case changes?  (but then watch out 
for things like iPad)  If a 'word' contains numbers, do want them left 
together, or separated?  Do you want stemming (where searching for 'stemming' 
would also find 'stem', 'stemmed', that sort of thing?)  Is this always 
English, or are the other languages involved.  Do you want the text processing 
to be the same for indexing vs searching?  Do you want to be able to find hits 
based on the first few characters of a term?  (ngrams)

Do you want to be able to highlight text segments where the search terms were 
found?

probably you want to read up on the various tokenizers and filters that are 
available.  Do some prototyping and see how it looks.

Here's a starting point: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Basically, there is no 'one size fits all' here.  Part of the power of Solr / 
Lucene is its configurability to achieve the results your business case calls 
for.  Part of the drawback of Solr / Lucene - especially for new folks - is its 
configurability to achieve the results you business case calls for. :)

Anyone got anything else to suggest for Michael?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/

From: Michael Dockery [mailto:dockeryjava...@yahoo.com]
Sent: Monday, September 12, 2011 1:18 PM
To: Bob Sandiford
Subject: Re: select query does not find indexed pdf document

thank you.  that worked.

Any tips for   very   very  basic setup of the schema xml?
   or is the default basic enough?

I basically only want to search search on
filename   andfile contents


From: Bob Sandiford bob.sandif...@sirsidynix.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org; Michael 
Dockery dockeryjava...@yahoo.com
Sent: Monday, September 12, 2011 10:04 AM
Subject: RE: select query does not find indexed pdf document

Um - looks like you specified your id value as pdfy, which is reflected in 
the results from the *:* query, but your id query is searching for vpn, 
hence no matches...

What does this query yield?

http://www/SearchApp/select/?q=id:pdfy

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | 
bob.sandif...@sirsidynix.commailto:bob.sandif...@sirsidynix.com
www.sirsidynix.com

 -Original Message-
 From: Michael Dockery 
 [mailto:dockeryjava...@yahoo.commailto:dockeryjava...@yahoo.com]
 Sent: Monday, September 12, 2011 9:56 AM
 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
 Subject: Re: select query does not find indexed pdf document

 http://www/SearchApp/select/?q=id:vpn

 yeilds this:
   ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lstname=responseHeader
   intname=status0/int
   intname=QTime15/int
 - lstname=params
   strname=qid:vpn/str
   /lst
   /lst
   result name=responsenumFound=0start=0/
   /response


 *

  http://www/SearchApp/select/?q=*:*

 yeilds this:

   ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lstname=responseHeader
   intname=status0/int
   intname=QTime16/int
 - lstname=params
   strname=q*.*/str
   /lst
   /lst
 - resultname=responsenumFound=1start=0
 - doc
   strname=authordoc/str
 - arrname=content_type
   strapplication/pdf/str
   /arr
   strname=idpdfy/str
   datename=last_modified2011-05-20T02:08:48Z/date
 - arrname=title
   strdmvpndeploy.pdf/str
   /arr
   /doc
   /result
   /response


 From: Jan Høydahl jan@cominvent.commailto:jan@cominvent.com
 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org; Michael 
 Dockery
 dockeryjava...@yahoo.commailto:dockeryjava...@yahoo.com
 Sent: Monday, September 12, 2011 4:59 AM
 Subject: Re: select query does not find indexed pdf document

 Hi,

 What do you get from a query http://www/SearchApp/select/?q=*:* or
 http://www/SearchApp/select/?q=id:vpn ?
 You may not have mapped the fields correctly to your schema?

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 12. sep. 2011, at 02:12, Michael Dockery wrote:

  I am new to solr.
 
  I tried to upload a pdf file via curl to my solr webapp (on tomcat)
 
  curl
 http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdfstream.co
 ntentType=application/pdfliteral.id=pdfycommit=true
 
 
 
  ?xml version=1.0 encoding=UTF-8?
  response
  lst name

RE: Sentence aware Highlighter

2011-09-06 Thread Bob Sandiford
What if you were to make your field a multi-valued field, and at indexing time, 
split up the text into sentences, putting each sentence into the solr document 
as one of the values for the mv field?  Then I think the normal highlighting 
code can be used to pull the entire value (i.e. sentence) of a matching mv 
instance within your document?  I.E. put the 'overhead' into the index step, 
rather than trying to do it at search time?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
 Sent: Monday, September 05, 2011 10:33 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Sentence aware Highlighter
 
 (11/09/05 23:09), O. Klein wrote:
  Using the regex in the old highlighter I had reasonable sentence
 aware
  highlighting, but speed is an issue. So I tried to get this working
 with the
  VFH, but this obviously didn't work with the regex.
 
  So I am looking for ways to get the same behavior but with improved
 speed
  and came across https://issues.apache.org/jira/browse/LUCENE-1824,
 which at
  least would be a small improvement, but the last comment confused me,
 as I
  thought FVH was going to be the new highlighter for Solr. So this
 patch
  would make some sense if im not mistaken.
 
  Nonetheless has anyone managed to make something like a
  SentenceAwareFragmentsBuilder? Or have some advise on how to realise
 this?
 
 Sorry for the long delay on the issue!
 I'd like to take a look into it in this week. Hopefully, BreakIterator
 may be
 used, which Robert mentioned in the JIRA.
 
 Thank you for your patience!
 
 koji
 --
 Check out Query Log Visualizer for Apache Solr
 http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
 http://www.rondhuit.com/en/




RE: performance crossover between single index and sharding

2011-08-04 Thread Bob Sandiford
Dumb question time - you are using a 64 bit Java, and not a 32 bit Java?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
 Sent: Thursday, August 04, 2011 2:39 AM
 To: solr-user@lucene.apache.org
 Subject: Re: performance crossover between single index and sharding
 
 Hi Shawn,
 
 the 0.05 seconds for search time at peek times (3 qps) is my target for
 Solr.
 The numbers for solr are from Solr's statistic report page. So 39.5
 seconds
 average per request is definately to long and I have to change to
 sharding.
 
 For FAST system the numbers for the search dispatcher are:
   0.042 sec elapsed per normal search, on avg.
   0.053 sec average uncached normal search time (last 100 queries).
   99.898% of searches using  1 sec
   99.999% of searches using  3 sec
   0.000% of all requests timed out
   22454567.577 sec time up (that is 259 days)
 
 Is there a report page for those numbers for Solr?
 
 About the RAM, the 32GB RAM sind physical for each VM and the 20GB RAM
 are -Xmx for Java.
 Yesterday I noticed that we are running out of heap during replication
 so I have to
 increase -Xmx to about 22g.
 
 The reported 0.6 average requests per second seams to me right because
 the Solr system isn't under full load yet. The FAST system is still
 taking
 most of the load. I plan to switch completely to Solr after sharding is
 up and
 running stable. So there will be additional 3 qps to Solr at peek
 times.
 
 I don't know if a controlling master like FAST makes any sense for
 Solr.
 The small VMs with heartbeat and haproxy sounds great, must be on my
 todo list.
 
 But the biggest problem currently is, how to configure the DIH to split
 up the
 content to several indexer. Is there an indexing distributor?
 
 Regards,
 Bernd
 
 
 Am 03.08.2011 16:33, schrieb Shawn Heisey:
  Replies inline.
 
  On 8/3/2011 2:24 AM, Bernd Fehling wrote:
  To show that I compare apples and oranges here are my previous FAST
 Search setup:
  - one master server (controlling, logging, search dispatcher)
  - six index server (4.25 mio docs per server, 5 slices per index)
  (searching and indexing at the same time, indexing once per week
 during the weekend)
  - each server has 4GB RAM, all servers are physical on seperate
 machines
  - RAM usage controlled by the processes
  - total of 25.5 mio. docs (mainly metadata) from 1500 databases
 worldwide
  - index size is about 67GB per indexer -- about 402GB total
  - about 3 qps at peek times
  - with average search time of 0.05 seconds at peek times
 
  An average query time of 50 milliseconds isn't too bad. If the number
 from your Solr setup below (39.5) is the QTime, then Solr thinks it is
  performing better, but Solr's QTime does not include absolutely
 everything that hs to happen. Do you by chance have 95th and 99th
 percentile
  query times for either system?
 
  And here is now my current Solr setup:
  - one master server (indexing only)
  - two slave server (search only) but only one is online, the second
 is fallback
  - each server has 32GB RAM, all server are virtuell
  (master on a seperate physical machine, both slaves together on a
 physical machine)
  - RAM usage is currently 20GB to java heap
  - total of 31 mio. docs (all metadata) from 2000 databases worldwide
  - index size is 156GB total
  - search handler statistic report 0.6 average requests per second
  - average time per request 39.5 (is that seconds?)
  - building the index from scratch takes about 20 hours
 
  I can't tell whether you mean that each physical host has 32GB or
 each VM has 32GB. You want to be sure that you are not oversubscribing
 your
  memory. If you can get more memory in your machines, you really
 should. Do you know whether that 0.6 seconds is most of the delay that
 a user
  sees when making a search request, or are there other things going on
 that contribute more delay? In our webapp, the Solr request time is
  usually small compared with everything else the server and the user's
 browser are doing to render the results page. As much as I hate being
 the
  tall pole in the tent, I look forward to the day when the developers
 can change that balance.
 
  The good thing is I have the ability to compare a commercial product
 and
  enterprise system to open source.
 
  I started with my simple Solr setup because of kiss (keep it
 simple and stupid).
  Actually it is doing excellent as single index on a single virtuell
 server.
  But the average time per request should be reduced now, thats why I
 started
  this discussion.
  While searches with smaller Solr index size (3 mio. docs) showed
 that it can
  stand with FAST Search it now shows that its time to go with
 sharding.
  I think we are already far behind the point of search performance
 crossover.
 
  What I hope to get with sharding:
  - reduce time

RE: previous and next rows of current record

2011-07-21 Thread Bob Sandiford
Well, it sort of depends on what you mean by the 'previous' and the 'next' 
record.

Do you have some type of sequencing built into your concept of your solr / 
lucene indexes?  Do you have sequential id's?

i.e. What's the use case, and what's the data available to support your use 
case?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

 -Original Message-
 From: Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Sent: Thursday, July 21, 2011 2:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: previous and next rows of current record
 
 Pls help..
 
 On Thursday, July 21, 2011, Jonty Rhods jonty.rh...@gmail.com wrote:
  Hi,
 
  Is there any special query in solr to get the previous and next
 record of current record.I am getting single record detail using id
 from solr server. I need to get  next and previous on detail page.
 
  regardsJonty
 




RE: previous and next rows of current record

2011-07-21 Thread Bob Sandiford
But - what is it that makes '9' the next id after '5'?  why not '6'?  Or 
'91238412'? or '4'?

i.e. you still haven't answered the question about what 'next' and 'previous' 
really means...

But - if you already know that '9' is the next page, why not just do another 
query with id '9' to get the next record?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Sent: Thursday, July 21, 2011 2:33 PM
 To: solr-user@lucene.apache.org
 Subject: Re: previous and next rows of current record
 
 Hi
 
 in my case there is no id sequence. id is generated sequence wise for
 all category. but when we filter by category then same id become
 random. If i m on detail page which have id 5 and nrxt id is 9 so on
 same page my requirment is to get next id is 9.
 
 On Thursday, July 21, 2011, Bob Sandiford
 bob.sandif...@sirsidynix.com wrote:
  Well, it sort of depends on what you mean by the 'previous' and the
 'next' record.
 
  Do you have some type of sequencing built into your concept of your
 solr / lucene indexes?  Do you have sequential id's?
 
  i.e. What's the use case, and what's the data available to support
 your use case?
 
  Bob Sandiford | Lead Software Engineer | SirsiDynix
  P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
  www.sirsidynix.com
 
  -Original Message-
  From: Jonty Rhods [mailto:jonty.rh...@gmail.com]
  Sent: Thursday, July 21, 2011 2:18 PM
  To: solr-user@lucene.apache.org
  Subject: Re: previous and next rows of current record
 
  Pls help..
 
  On Thursday, July 21, 2011, Jonty Rhods jonty.rh...@gmail.com
 wrote:
   Hi,
  
   Is there any special query in solr to get the previous and next
  record of current record.I am getting single record detail using id
  from solr server. I need to get  next and previous on detail page.
  
   regardsJonty
  
 
 
 




RE: Restricting the Solr Posting List (retrieved set)

2011-07-11 Thread Bob Sandiford
A good answer may also depend on WHY you are wanting to restrict to 500K 
documents.

Are you seeking to reduce the time spent by Solr in determining the doc count?  
Are you just wanting to prevent people from moving too far into the result set? 
 Is it case that you can only display 6 digits for your return count? :)

If Solr is performing adequately, you could always just artificially restrict 
the result set.  Solr doesn't actually 'return' all 5M documents - it only 
returns the number you have specified in your query (as well as having some 
cache for the next results in anticipation of a subsequent query).  So, if the 
total count returned exceeds 500K, then just report 500K as the number of 
results, and similarly restrict how far a user can page through the results...

(And - you can (and sounds like you should) sort your results by descending 
post date so that you do in fact get the most recent ones coming back first...)

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com]
 Sent: Monday, July 11, 2011 7:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Restricting the Solr Posting List (retrieved set)
 
 
  We want to search in an index in such a way that even if a
  clause has a long
  posting list - Solr should stop collecting documents for
  the clause
  after receiving X documents that match the clause.
 
  For example, if  for query India,solr can return 5M
  documents, we would
  like to restrict the set at only 500K documents.
 
  The assumption is that since we are posting chronologically
  - we would like
  the X most recent documents to be matched for the clause
  only.
 
  Is it possible anyway?
 
 Looks like your use-case is suitable for time based sharding.
 http://wiki.apache.org/solr/DistributedSearch
 
 Lets say you divide your shards according to months. You will have a
 separate core for each month.
 http://wiki.apache.org/solr/CoreAdmin
 
 When a query comes in, you will hit the most recent core. If you don't
 obtain enough results add a new value (previous month core) to shards=
 parameter.
 




RE: updating existing data in index vs inserting new data in index

2011-07-07 Thread Bob Sandiford
What are you using as the unique id in your Solr index?  It sounds like you may 
have one value as your Solr index unique id, which bears no resemblance to a 
unique[1] id derived from your data...

Or - another way to put it - what is it that makes these two records in your 
Solr index 'the same', and what are the unique id's for those two entries in 
the Solr index?  How are those id's related to your original data?

[1] not only unique, but immutable.  I.E. if you update a row in your database, 
the unique id derived from that row has to be the same as it would have been 
before the update.  Otherwise, there's nothing for Solr to recognize as a 
duplicate entry, and do a 'delete' and 'insert' instead of just an 'insert'.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: Mark juszczec [mailto:mark.juszc...@gmail.com]
 Sent: Thursday, July 07, 2011 9:15 AM
 To: solr-user@lucene.apache.org
 Subject: updating existing data in index vs inserting new data in index
 
 Hello all
 
 I'm using Solr 3.2 and am confused about updating existing data in an
 index.
 
 According to the DataImportHandler Wiki:
 
 *delta-import* : For incremental imports and change detection run the
 command `http://host:port/solr/dataimport?command=delta-import . It
 supports the same clean, commit, optimize and debug parameters as
 full-import command.
 
 I know delta-import will find new data in the database and insert it
 into
 the index.  My problem is how it handles updates where I've got a record
 that exists in the index and the database, the database record is
 changed
 and I want to incorporate those changes in the existing record in the
 index.
  IOW I don't want to insert it again.
 
 I've tried this and wound up with 2 records with the same key in the
 index.
  The first contains the original db values found when the index was
 created,
 the 2nd contains the db values after the record was changed.
 
 I've also found this
 http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.4720
 66.n3.nabble.com%2FDelta-import-with-solrj-client-tp1085763p1086173.html
 the
 subject is 'Delta-import with solrj client'
 
 Greetings. I have a *solrj* client for fetching data from database. I
 am
 using *delta*-*import* for fetching data. If a column is changed in
 database
 using timestamp with *delta*-*import* i get the latest column indexed
 but
 there are *duplicate* values in the index similar to the column but the
 data
 is older. This works with cleaning the index but i want to update the
 index
 without cleaning it. Is there a way to just update the index with the
 updated column without having *duplicate* values. Appreciate for any
 feedback.
 
 Hando
 
 There are 2 responses:
 
 Short answer is no, there isn't a way. *Solr* doesn't have the concept
 of
 'Update' to an indexed document. You need to add the full document (all
 'columns') each time any one field changes. If doing that in your
 DataImportHandler logic is difficult you may need to write a separate
 Update
 Service that does:
 
 1) Read UniqueID, UpdatedColumn(s)  from database
 2) Using UniqueID Retrieve document from *Solr*
 3) Add/Update field(s) with updated column(s)
 4) Add document back to *Solr*
 
 Although, if you use DIH to do a full *import*, using the same query in
 your *Delta*-*Import* to get the whole document shouldn't be that
 difficult.
 
 and
 
 Hi,
 
 Make sure you use a proper ID field, which does *not* change even if
 the
 content in the database changes. In this way, when your
 *delta*-*import* fetches
 changed rows to index, they will update the existing rows in your index.
 
 
 I have an ID field that doesn't change.  It is the primary key field
 from
 the database table I am trying to index and I have verified it is
 unique.
 
 So, does Solr allow updates (not inserts) of existing records?  Is
 anyone
 able to do this?
 
 Mark



RE: updating existing data in index vs inserting new data in index

2011-07-07 Thread Bob Sandiford
Hi, Mark.

I haven't used DIH myself - so I'll need to leave comments on your set up to 
others who have done so.

Another question - after your initial index create (and after each delta), do 
you run a 'commit'?  Do you run an 'optimize'?  (Without the optimize, 
'deleted' records still show up in query results...)

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: Mark juszczec [mailto:mark.juszc...@gmail.com]
 Sent: Thursday, July 07, 2011 10:04 AM
 To: solr-user@lucene.apache.org
 Subject: Re: updating existing data in index vs inserting new data in
 index
 
 Bob
 
 Thanks very much for the reply!
 
 I am using a unique integer called order_id as the Solr index key.
 
 My query, deltaQuery and deltaImportQuery are below:
 
 entity name=item1
   pk=ORDER_ID
   query=select 1 as TABLE_ID , orders.order_id,
 orders.order_booked_ind,
 orders.order_dt, orders.cancel_dt, orders.account_manager_id,
 orders.of_header_id, orders.order_status_lov_id, orders.order_type_id,
 orders.approved_discount_pct, orders.campaign_nm,
 orders.approved_by_cd,orders.advertiser_id, orders.agency_id from
 orders
 
   deltaImportQuery=select 1 as TABLE_ID, orders.order_id,
 orders.order_booked_ind, orders.order_dt, orders.cancel_dt,
 orders.account_manager_id, orders.of_header_id,
 orders.order_status_lov_id,
 orders.order_type_id, orders.approved_discount_pct, orders.campaign_nm,
 orders.approved_by_cd,orders.advertiser_id, orders.agency_id from orders
 where orders.order_id = '${dataimporter.delta.ORDER_ID}'
 
   deltaQuery=select orders.order_id from orders where orders.change_dt
 
 to_date('${dataimporter.last_index_time}','-MM-DD HH24:MI:SS') 
 /entity
 
 The test I am running is two part:
 
 1.  After I do a full import of the index, I insert a brand new record
 (with
 a never existed before order_id) in the database.  The delta import
 picks
 this up just fine.
 
 2.  After the full import, I modify a record with an order_id that
 already
 shows up in the index.  I have verified there is only one record with
 this
 order_id in both the index and the db before I do the delta update.
 
 I guess the question is, am I screwing myself up by defining my own Solr
 index key?  I want to, ultimately, be able to search on ORDER_ID in the
 Solr
 index.  However, the docs say (I think) a field does not have to be the
 Solr
 primary key in order to be searchable.  Would I be better off letting
 Solr
 manage the keys?
 
 Mark
 
 On Thu, Jul 7, 2011 at 9:24 AM, Bob Sandiford
 bob.sandif...@sirsidynix.comwrote:
 
  What are you using as the unique id in your Solr index?  It sounds
 like you
  may have one value as your Solr index unique id, which bears no
 resemblance
  to a unique[1] id derived from your data...
 
  Or - another way to put it - what is it that makes these two records
 in
  your Solr index 'the same', and what are the unique id's for those two
  entries in the Solr index?  How are those id's related to your
 original
  data?
 
  [1] not only unique, but immutable.  I.E. if you update a row in your
  database, the unique id derived from that row has to be the same as it
 would
  have been before the update.  Otherwise, there's nothing for Solr to
  recognize as a duplicate entry, and do a 'delete' and 'insert' instead
 of
  just an 'insert'.
 
  Bob Sandiford | Lead Software Engineer | SirsiDynix
  P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
  www.sirsidynix.com
 
 
   -Original Message-
   From: Mark juszczec [mailto:mark.juszc...@gmail.com]
   Sent: Thursday, July 07, 2011 9:15 AM
   To: solr-user@lucene.apache.org
   Subject: updating existing data in index vs inserting new data in
 index
  
   Hello all
  
   I'm using Solr 3.2 and am confused about updating existing data in
 an
   index.
  
   According to the DataImportHandler Wiki:
  
   *delta-import* : For incremental imports and change detection run
 the
   command `http://host:port/solr/dataimport?command=delta-import .
 It
   supports the same clean, commit, optimize and debug parameters as
   full-import command.
  
   I know delta-import will find new data in the database and insert it
   into
   the index.  My problem is how it handles updates where I've got a
 record
   that exists in the index and the database, the database record is
   changed
   and I want to incorporate those changes in the existing record in
 the
   index.
IOW I don't want to insert it again.
  
   I've tried this and wound up with 2 records with the same key in the
   index.
The first contains the original db values found when the index was
   created,
   the 2nd contains the db values after the record was changed.
  
   I've also found this
  
 http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.4720
   66.n3.nabble.com%2FDelta-import-with-solrj-client-
 tp1085763p1086173.html
   the
   subject is 'Delta-import

Solr just 'hangs' under load test - ideas?

2011-06-29 Thread Bob Sandiford
Hi, all.

I'm hoping someone has some thoughts here.

We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the 
getLuceneVersion() calls, but use luceneMatchVersion directly).

We're running in a Tomcat instance, 64 bit Java.  CATALINA_OPTS are: -Xmx7168m 
-Xms7168m -XX:MaxPermSize=256M

We're running 2 Solr cores, with the same schema.

We use SolrJ to run our searches from a Java app running in JBoss.

JBoss, Tomcat, and the Solr Index folders are all on the same server.

In case it's relevant, we're using JMeter as a load test harness.

We're running on Solaris, a 16 processor box with 48GB physical memory.

I've run a successful load test at a 100 user load (at that rate there are 
about 5-10 solr searches / second), and solr search responses were coming in 
under 100ms.

When I tried to ramp up, as far as I can tell, Solr is just hanging.  (We have 
some logging statements around the SolrJ calls - just before, we log how long 
our query construction takes, then we run the SolrJ query and log the search 
times.  We're getting a number of the query construction logs, but no 
corresponding search time logs).

Symptoms:
The Tomcat and JBoss processes show as well under 1% CPU, and they are still 
the top processes.  CPU states show around 99% idle.   RES usage for the two 
Java processes around 3GB each.  LWP under 120 for each.  STATE just shows as 
sleep.  JBoss is still 'alive', as I can get into a piece of software that 
talks to our JBoss app to get data.

We set things up to use log4j logging for Solr - the log isn't showing any 
errors or exceptions.

We're not indexing - just searching.

Back in January, we did load testing on a prototype, and had no problems 
(though that was Solr 1.4 at the time).  It ramped up beautifully - bottle 
necks were our apps, not Solr.  What I'm benchmarking now is a descendent of 
that prototyping - a bit more complex on searches and more fields in the 
schema, but same basic search logic as far as SolrJ usage.

Any ideas?  What else to look at?  Ringing any bells?

I can send more details if anyone wants specifics...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/



RE: Solr just 'hangs' under load test - ideas?

2011-06-29 Thread Bob Sandiford
OK - I figured it out.  It's not solr at all (and I'm not really surprised).

In the prototype benchmarks, we used a different instance of tomcat than we're 
using for production load tests.  Our prototype tomcat instance had no 
maxThreads value set, so was using the default value of 200.  The production 
tomcat environment has a maxThreads value of 15 - we were just running out of 
threads and getting connection refused exceptions thrown when we ramped up the 
Solr hits past a certain level.

Thanks for considering, Yonik (and any others waiting to see any reply I 
made)...

(As others have said - this listserv is great!)

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: Wednesday, June 29, 2011 12:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr just 'hangs' under load test - ideas?
 
 Can you get a thread dump to see what is hanging?
 
 -Yonik
 http://www.lucidimagination.com
 
 On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford
 bob.sandif...@sirsidynix.com wrote:
  Hi, all.
 
  I'm hoping someone has some thoughts here.
 
  We're running Solr 3.1 (with the patch for SolrQueryParser.java to
 not do the getLuceneVersion() calls, but use luceneMatchVersion
 directly).
 
  We're running in a Tomcat instance, 64 bit Java.  CATALINA_OPTS are:
 -Xmx7168m -Xms7168m -XX:MaxPermSize=256M
 
  We're running 2 Solr cores, with the same schema.
 
  We use SolrJ to run our searches from a Java app running in JBoss.
 
  JBoss, Tomcat, and the Solr Index folders are all on the same server.
 
  In case it's relevant, we're using JMeter as a load test harness.
 
  We're running on Solaris, a 16 processor box with 48GB physical
 memory.
 
  I've run a successful load test at a 100 user load (at that rate
 there are about 5-10 solr searches / second), and solr search responses
 were coming in under 100ms.
 
  When I tried to ramp up, as far as I can tell, Solr is just hanging.
  (We have some logging statements around the SolrJ calls - just before,
 we log how long our query construction takes, then we run the SolrJ
 query and log the search times.  We're getting a number of the query
 construction logs, but no corresponding search time logs).
 
  Symptoms:
  The Tomcat and JBoss processes show as well under 1% CPU, and they
 are still the top processes.  CPU states show around 99% idle.   RES
 usage for the two Java processes around 3GB each.  LWP under 120 for
 each.  STATE just shows as sleep.  JBoss is still 'alive', as I can get
 into a piece of software that talks to our JBoss app to get data.
 
  We set things up to use log4j logging for Solr - the log isn't
 showing any errors or exceptions.
 
  We're not indexing - just searching.
 
  Back in January, we did load testing on a prototype, and had no
 problems (though that was Solr 1.4 at the time).  It ramped up
 beautifully - bottle necks were our apps, not Solr.  What I'm
 benchmarking now is a descendent of that prototyping - a bit more
 complex on searches and more fields in the schema, but same basic
 search logic as far as SolrJ usage.
 
  Any ideas?  What else to look at?  Ringing any bells?
 
  I can send more details if anyone wants specifics...
 
  Bob Sandiford | Lead Software Engineer | SirsiDynix
  P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
  www.sirsidynix.comhttp://www.sirsidynix.com/
 
 




RE: MultiValued facet behavior question

2011-06-22 Thread Bob Sandiford
 the facet field name.
3) You'll want to read up on the MemoryIndex class to see more about how it 
works, rather than me re-iterating that here.

[1] Caveats
1) We didn't do anything with the date type faceting, or with any ranges.
2) We didn't do anything with Facet prefix handling - it may or may not work if 
you need prefixes.
3) Anything else that facets do that we didn't handle - or at least, didn't 
test :)  As I say, it's a very special case for us, and this is in no way 
intended to be a general solution or fit for 'prime time' submission as a Solr 
enhancement.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

 -Original Message-
 From: Bill Bell [mailto:billnb...@gmail.com]
 Sent: Wednesday, June 22, 2011 3:49 AM
 To: solr-user@lucene.apache.org
 Subject: Re: MultiValued facet behavior question
 
 You can type q=cardiology and match on cardiologist. If stemming did
 not
 work you can just add a synonym:
 
 cardiology,cardiologist
 
 But that is not the issue. The issue is around multiValue fields and
 facets. You would expect a user
 Who is searching on the multiValued field to match on some values in
 there. For example,
 they type Cardiologist and it matches on the value Cardiologist. So
 it
 matches in the multiValue field.
 So that part works. Then when I output the facet, I need a different
 behavior than the default. I need
 The facet to only output the value that matches (scored) - NOT ALL
 VALUES
 in the multiValued field.
 
 I think it makes sense?
 
 
 On 6/22/11 1:42 AM, Michael Kuhlmann s...@kuli.org wrote:
 
 Am 22.06.2011 05:37, schrieb Bill Bell:
  It can get more complicated. Here is another example:
 
  q=cardiologydefType=dismaxqf=specialties
 
 
  (Cardiology and cardiologist are stems)...
 
  But I don't really know which value in Cardiologist match perfectly.
 
  Again, I only want it to return:
 
  Cardiologist: 3
 
 You would never get Cardiologist: 3 as the facet result, because if
 Cardiologist would be in your index, it's impossible to find it when
 searching for cardiology (except when you manage to write some
 strange
 tokenizer that translates cardiology to Cardiologist on query
 time,
 including the upper case letter).
 
 Facets are always taken from the index, so they usually match exactly
 or
 never when querying for it.
 
 -Kuli
 
 




RE: difficult sort

2011-06-17 Thread Bob Sandiford
What if you were to set up a new field, which is the concatenation of your 
'field' and 'category group', and then facet on that?  How many combinations 
would we be talking about here?  And - against what field(s) do you run your 
query?

We did something a bit similar, where we wanted an 'author' search, where 
'author' is a field in our documents.  We have a field set up based on 'author' 
to search against, as well as a field based on 'author' for faceting.  We 
search against the author field, return 0 results but all the facet values, and 
then display the facet values with their counts, and when the users select one, 
then we issue a new query to return all documents with that author facet value.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: lee carroll [mailto:lee.a.carr...@googlemail.com]
 Sent: Friday, June 17, 2011 5:47 AM
 To: solr-user@lucene.apache.org
 Subject: difficult sort
 
 Is this possible in 1.4.1
 
 Return a result set sorted by a field but within Categorical groups,
 limited to 1 record per group
 Something like:
 group1
 xxx (bottom of sorted field within group)
 group2
 xxx (bottom of sorted field within group)
 etc
 
 is the only approach to issue multiple queries and collate in the
 front end app ?
 
 cheers lee c




RE: Copying few field using copyField to non multiValued field

2011-06-15 Thread Bob Sandiford
Omri - you need to indicate to Solr that your at_location field can accept 
multiple values.  Add this to the field declaration:

multiValued=true

See this reference for more information / options:

http://wiki.apache.org/solr/SchemaXml


Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: Omri Cohen [mailto:omri...@gmail.com]
 Sent: Wednesday, June 15, 2011 8:00 AM
 To: solr-user@lucene.apache.org
 Subject: Copying few field using copyField to non multiValued field
 
 Hello all,
 
 in my schema.xml i have this fields:
 
field name=at_location   type=text indexed=index
 stored=true required=false /
field name=at_country   type=text indexed=index
 stored=true required=false /
field name=at_city   type=text indexed=index
 stored=true required=false /
field name=at_state   type=text indexed=index
 stored=true required=false /.
 
 I am trying to do the following:
 
 copyField source=at_city dest=at_location/
 copyField source=at_state dest=at_location/
 copyField source=at_country dest=at_location/
 
 I am getting the next exception:
 
 ERROR: multiple values encountered for non multiValued copy field
 at_location
 
 some one has any idea, how I solve this without changing at_location to
 multiField?
 
 thanks
 
 *Omri Cohen*
 
 
 
 Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-
 6036295
 
 
 
 
 My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
 [image:
 Twitter] http://www.twitter.com/omricohe [image:
 WordPress]http://omricohen.me
  Please consider your environmental responsibility. Before printing
 this
 e-mail message, ask yourself whether you really need a hard copy.
 IMPORTANT: The contents of this email and any attachments are
 confidential.
 They are intended for the named recipient(s) only. If you have received
 this
 email by mistake, please notify the sender immediately and do not
 disclose
 the contents to anyone or make copies thereof.
 Signature powered by
 http://www.wisestamp.com/email-
 install?utm_source=extensionutm_medium=emailutm_campaign=footer
 WiseStamphttp://www.wisestamp.com/email-
 install?utm_source=extensionutm_medium=emailutm_campaign=footer



RE: Copying few field using copyField to non multiValued field

2011-06-15 Thread Bob Sandiford
Oops - sorry - missed that...

Well, the multiValued setting is explicitly to allow multiple values.

So - what's your actual use case - i.e. why do you want multiple values in a 
field, but not want it to be multiValued?  What's the problem you're trying to 
solve here?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


 -Original Message-
 From: Omri Cohen [mailto:omri...@gmail.com]
 Sent: Wednesday, June 15, 2011 8:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Copying few field using copyField to non multiValued field
 
 thanks for the quick response, though as I said in my original post:
 
 *some one has any idea, how I solve this without changing at_location
 to
 multiField? *
 
 thank you very much though
 *
 
 *
 
 *Omri Cohen*
 
 
 
 Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-
 6036295
 
 
 
 
 My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
 [image:
 Twitter] http://www.twitter.com/omricohe [image:
 WordPress]http://omricohen.me
  Please consider your environmental responsibility. Before printing
 this
 e-mail message, ask yourself whether you really need a hard copy.
 IMPORTANT: The contents of this email and any attachments are
 confidential.
 They are intended for the named recipient(s) only. If you have received
 this
 email by mistake, please notify the sender immediately and do not
 disclose
 the contents to anyone or make copies thereof.
 Signature powered by
 http://www.wisestamp.com/email-
 install?utm_source=extensionutm_medium=emailutm_campaign=footer
 WiseStamphttp://www.wisestamp.com/email-
 install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 
 
 On Wed, Jun 15, 2011 at 3:21 PM, Bob Sandiford
 bob.sandif...@sirsidynix.com
  wrote:
 
  Omri - you need to indicate to Solr that your at_location field can
 accept
  multiple values.  Add this to the field declaration:
 
 multiValued=true
 
  See this reference for more information / options:
 
  http://wiki.apache.org/solr/SchemaXml
 
 
  Bob Sandiford | Lead Software Engineer | SirsiDynix
  P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
  www.sirsidynix.com
 
 
   -Original Message-
   From: Omri Cohen [mailto:omri...@gmail.com]
   Sent: Wednesday, June 15, 2011 8:00 AM
   To: solr-user@lucene.apache.org
   Subject: Copying few field using copyField to non multiValued field
  
   Hello all,
  
   in my schema.xml i have this fields:
  
  field name=at_location   type=text
 indexed=index
   stored=true required=false /
  field name=at_country   type=text
 indexed=index
   stored=true required=false /
  field name=at_city   type=text
 indexed=index
   stored=true required=false /
  field name=at_state   type=text indexed=index
   stored=true required=false /.
  
   I am trying to do the following:
  
   copyField source=at_city dest=at_location/
   copyField source=at_state dest=at_location/
   copyField source=at_country dest=at_location/
  
   I am getting the next exception:
  
   ERROR: multiple values encountered for non multiValued copy field
   at_location
  
   some one has any idea, how I solve this without changing
 at_location to
   multiField?
  
   thanks
  
   *Omri Cohen*
  
  
  
   Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-
   6036295
  
  
  
  
   My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
   [image:
   Twitter] http://www.twitter.com/omricohe [image:
   WordPress]http://omricohen.me
Please consider your environmental responsibility. Before printing
   this
   e-mail message, ask yourself whether you really need a hard copy.
   IMPORTANT: The contents of this email and any attachments are
   confidential.
   They are intended for the named recipient(s) only. If you have
 received
   this
   email by mistake, please notify the sender immediately and do not
   disclose
   the contents to anyone or make copies thereof.
   Signature powered by
   http://www.wisestamp.com/email-
   install?utm_source=extensionutm_medium=emailutm_campaign=footer
   WiseStamphttp://www.wisestamp.com/email-
   install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 



RE: Text field case sensitivity problem

2011-06-14 Thread Bob Sandiford
Unfortunately, wild card search terms don't get processed by the analyzers.

One suggestion that's fairly common is to make sure you lower case your wild 
card search terms yourself before issuing the query.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

 -Original Message-
 From: Jamie Johnson [mailto:jej2...@gmail.com]
 Sent: Tuesday, June 14, 2011 5:13 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Text field case sensitivity problem
 
 Also of interest to me is this returns results
 http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine
 
 
 On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
  I am using the following for my text field:
 
  fieldType name=text class=solr.TextField
  positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  !-- in this example, we will only use synonyms at query time
  filter class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true expand=false/
  --
  !-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and
 query
analyzers to leave a 'gap' for more accurate phrase
 queries.
  --
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true
  /
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.KeywordMarkerFilterFactory
  protected=protwords.txt/
  filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
  ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true
  /
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.KeywordMarkerFilterFactory
  protected=protwords.txt/
  filter class=solr.PorterStemFilterFactory/
/analyzer
  /fieldType
 
  I have a field defined as
 field name=Person_Name type=text stored=true indexed=true
 /
 
  when I execute a go to the following url I get results
  http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris*
  but if I do
  http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris*
  I get nothing.  I thought the LowerCaseFilterFactory would have
 handled
  lowercasing both the query and what is being indexed, am I missing
  something?
 



Odd (i.e. wrong) File Names in 3.1 distro source zip

2011-05-31 Thread Bob Sandiford
Hi, all.

I just downloaded the apache-solr-3.1.0-src.gz file, and unzipped that.  I see 
inside there, a apache-solr-3.1.0-src file, and tried unzipping that.  There 
weren't any errors, but as I look inside the apache-solr-3.0.1-src file, I see 
that not all the java code (for example) ended up being unzipped with a .java 
extension.

For example,  in the path 
apache-solr-3.1.0\lucene\backwards\src\test\org\apache\lucene\analysis\tokenattributes
 I see two files:
TestSimpleAtt100644
TestTermAttri100644

Any ideas?  Is there some specific tool I should be using to expand these?

I'm doing this in Windows XP.

Thanks!

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/
Join the conversation - you may even get an iPad or Nook out of it!

[cid:image002.jpg@01CC1FC6.2324C620]http://www.facebook.com/SirsiDynixLike us 
on Facebook!

[cid:image004.jpg@01CC1FC6.2324C620]http://twitter.com/#!/SirsiDynixFollow us 
on Twitter!




RE: highlighting in multiValued field

2011-05-26 Thread Bob Sandiford
What is your actual query?  Did you look at the hl.snippets parameter?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
Join the conversation - you may even get an iPad or Nook out of it!

Like us on Facebook!

Follow us on Twitter!



 -Original Message-
 From: Jeffrey Chang [mailto:jclal...@gmail.com]
 Sent: Thursday, May 26, 2011 11:10 PM
 To: solr-user@lucene.apache.org
 Subject: highlighting in multiValued field
 
 Hi All,
 
 I am having a problem with search highlighting for multiValued fields
 and am
 wondering if someone can point me in the right direction.
 
 I have in my schema a multiValued field as such:
  field name=description type=text stored=true indexed=true
 multiValued=true/
 
 When I search for term Tel, it returns me the correct doc:
 doc
 ...
 arr name=description
   strTel to talent 1/str
   strTel to talent 2/str
   /arr
 ...
 /doc
 
 When I enable highlighting, it returns me the following highlight with
 only
 one vector returned:
 ...
 lst name=highlighting
   lst name=1
   arr name=description
 stremTel/em to talent 1/str
   /arr
   /lst
 /lst
 What I'm expecting is actually both vectors to be returned as such:
 lst name=highlighting
   lst name=1
   arr name=description
 stremTel/em to talent 1/str
 stremTel/em to talent 2/str
   /arr
   /lst
 /lst
 Am I doing something wrong in my config or query (I'm using default)?
 Any
 help is appreciated.
 
 Thanks,
 Jeff



RE: highlighting in multiValued field

2011-05-26 Thread Bob Sandiford
The only thing I can think of is to post-process your snippets.  I.E. pull the 
highlighting tags out of the strings, look for the match in your result 
description field looking for a match, and if you find one, replace that 
description with the original highlight text (i.e. with the highlight tags 
still in place).

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
Join the conversation - you may even get an iPad or Nook out of it!

Like us on Facebook!

Follow us on Twitter!



 -Original Message-
 From: Jeffrey Chang [mailto:jclal...@gmail.com]
 Sent: Friday, May 27, 2011 12:16 AM
 To: solr-user@lucene.apache.org
 Subject: Re: highlighting in multiValued field
 
 Hi Bob,
 
 I have no idea how I missed that! Thanks for pointing me to use
 hl.snippets
 - that did the magic!
 
 Please allow me squeeze one more question along the same line.
 
 Since I'm now able to display multiple snippets - what I'm trying to
 achieve
 is, determine which highlighted snippet maps back to what position in
 the
 original document.
 
 e.g. If I search for Tel, with highlighting and hl.snippets=2 it'll
 return
 me:
 doc
 ...
 arr name=descID
   str1/str
   str2/str
   str3/str
 /arr
 arr name=description
   strTel to talent 1/str
   strTel to talent 2/str
   strTel to talent 3/str
 /arr
 ...
 /doc
 lst name=highlighting
   lst name=1
   arr name=description
 stremTel/em to talent 1/str
 stremTel/em to talent 2/str
   /arr
 /lst
 ...
 
 Is there a way for me to figure out which highlighted snippet belongs
 to
 which descID so I can display also display the non-highlighted rows for
 my
 search results.
 
 Or is this not the way how highlighting is designed and to be used?
 
 Thanks so much,
 Jeff
 [snip]



RE: Document match with no highlight

2011-05-12 Thread Bob Sandiford
Don't you need to include your unique id field in your 'fl' parameter?  It will 
be needed anyways so you can match up the highlight fragments with the result 
docs once highlighting is working...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
Join the conversation - you may even get an iPad or Nook out of it!

Like us on Facebook!

Follow us on Twitter!



 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com]
 Sent: Thursday, May 12, 2011 7:10 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Document match with no highlight
 
  URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%2
 23+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexpl
 ainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
    lst name=responseHeader
      int name=status0/int
      int name=QTime19/int
      lst name=params
        str name=explainOther/
        str
  name=indenton/str
        str
  name=hl.flDOC_TEXT/str
        str
  name=wtstandard/str
        str
  name=hl.maxAnalyzedChars-1/str
        str name=hlon/str
        str name=rows10/str
        str
  name=version2.2/str
        str
  name=debugQueryon/str
        str
  name=flDOC_TEXT,score/str
        str name=start0/str
        str name=qDOC_TEXT:3 1
  15/str
        str
  name=qtstandard/str
        str name=fq/
      /lst
    /lst
    result name=response numFound='1 start=0
  maxScore=0.035959315
      doc
        float
  name=score0.035959315/float
        arr name=DOC_TEXTstr
  ... /str/arr
      doc
    /result
    lst name=highlighting
      lst name=123456/
    /lst
    lst name=debug
      str name=rawquerystringDOC_TEXT:3
  1 15/str
      str name=querystringDOC_TEXT:3 1
  15/str
      str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
      str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
      lst name=explain
        str name=123456
          0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
          0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
          0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
      /str
    /lst
    str name=QParserLuceneQParser/str
    arr name=filter_queries
      str/
    /arr
    arr name=parsed_filter_queries/
    lst name=timing
      ...
    /lst
  /response
 
 
 Nothing looks suspicious.
 
 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.
 
 Also do you get snippet from the same doc, when you remove quotes from
 your query?
 




Test Post

2011-05-10 Thread Bob Sandiford
Hi, all.

Sorry for the 'spam' - I'm just testing that my posts are actually being seen.  
I've sent a few queries over the past couple of weeks and haven't had a single 
response :(

Anyways - if one or two would respond to this, I'd appreciate it - just to let 
me know that I'm being ignored, vs unseen :)

Thanks!

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/
Join the conversation - you may even get an iPad or Nook out of it!

[cid:image002.jpg@01CC0F1E.A9E7FB90]http://www.facebook.com/SirsiDynixLike us 
on Facebook!

[cid:image004.jpg@01CC0F1E.A9E7FB90]http://twitter.com/#!/SirsiDynixFollow us 
on Twitter!




Problems with Spellchecker in 3.1

2011-04-26 Thread Bob Sandiford

  strLANGUAGE_facet/str

  strPUBDATE_nfacet/str

  strSUBJECT_facet/str

  strABCDEF_cfacet/str

/arr

str name=qtspellcheckedStandard/str

arr name=fq

  strACCESS_LEVEL_nfacet:0/str

  strCLEARANCE_nfacet:0/str

  strNEED_TO_KNOWS_facet:@@EMPTY@@/str

  strCITIZENSHIPS_facet:@@EMPTY@@/str

  strRESTRICTIONS_facet:@@EMPTY@@/str

/arr

str name=facet.mincount1/str

str name=indenttrue/str

str name=hl.fl*/str

str name=rows12/str

str name=hl.snippets5/str

str name=start0/str

str name=qTITLE_boost:kljhklsdjahfkljsdhf book rck~100^200.0 OR 
PRIMARY_AUTHOR_boost:kljhklsdjahfkljsdhf book rck~100^100.0 OR 
DOC_TEXT:kljhklsdjahfkljsdhf book rck~100^2 OR 
PRIMARY_TITLE_boost:kljhklsdjahfkljsdhf book rck~100^1000.0 OR 
AUTHOR_boost:kljhklsdjahfkljsdhf book rck~100^20.0 OR 
textFuzzy:kljhklsdjahfkljsdhf~0.7 AND textFuzzy:book~0.7 AND 
textFuzzy:rck~0.7/str

  /lst

/lst

result name=response numFound=0 start=0 maxScore=0.0/ lst 
name=facet_counts

  lst name=facet_queries/

  lst name=facet_fields

lst name=AUTHOR_facet/

lst name=FORMAT_facet/

lst name=LANGUAGE_facet/

lst name=PUBDATE_nfacet/

lst name=SUBJECT_facet/

lst name=ABCDEF_cfacet/

  /lst

  lst name=facet_dates/

  lst name=facet_ranges/

/lst

lst name=highlighting/

lst name=spellcheck

  lst name=suggestions

lst name=rck

  int name=numFound5/int

  int name=startOffset362/int

  int name=endOffset365/int

  int name=origFreq0/int

  arr name=suggestion

lst

  str name=wordrock/str

  int name=freq24000/int

/lst

lst

  str name=wordrick/str

  int name=freq6048/int

/lst

lst

  str name=wordrack/str

  int name=freq84/int

/lst

lst

  str name=wordreck/str

  int name=freq78/int

/lst

lst

  str name=wordruck/str

  int name=freq30/int

/lst

  /arr

/lst

bool name=correctlySpelledfalse/bool

  /lst

/lst

/response







Thanks!


Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/



Problems with Spellchecker in 3.1

2011-04-25 Thread Bob Sandiford
Oops.  Sorry.  I'm hijacking my own thread to put a real Subject in place...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


 -Original Message-
 From: Bob Sandiford
 Sent: Monday, April 25, 2011 5:34 PM
 To: solr-user@lucene.apache.org
 Subject:
 
 Hi, all.
 
 We're having some troubles with the Solr Spellcheck Response.  We're
 running version 3.1.
 
 Overview:  If we search for something really ugly like:  
 kljhklsdjahfkljsdhf book rck
 
 then when we get back the response, there's a suggestions list for
 'rck', but no suggestions list for the other two words.  For 'book',
 that's fine, because it is 'spelled correctly' (i.e. we got hits on the
 word) and there shouldn't be any suggestions.  For the ugly thing,
 though, there aren't any hits.
 
 The problem is that when we're handling the result, we can't tell the
 difference between no suggestions for a 'correctly spelled' term, and
 no suggestions for something that's odd like this.
 
 (Now - this is happening with searches that aren't as obviously garbage
 - this was just to illustrate the point).
 
 Our setup:
 We're running multiple shards, which may be part of the issue.  For
 example, 'book' might be found in one of the shards, but not another.
 
 I don't *think* this has anything to do with our schema, since it's
 really how the Search Suggestions are being returned to us.
 
 What we'd really like to see is the response coming back with an
 indication that a word wasn't found / had no suggestions.  We've hacked
 around in the code a little bit to do this, but were wondering if
 anyone has come across this, and what approaches you've taken.
 
 Here's the xml we're getting back from the search:
 
 
 ?xml version=1.0 encoding=UTF-8?
 response
 
 lst name=responseHeader
   int name=status0/int
   int name=QTime56/int
   lst name=params
 str name=spellchecktrue/str
 str name=facettrue/str
 str name=sortscore desc, RELEVANCE_SORT_nsort desc/str
 str name=shards.qtspellcheckedStandard/str
 str name=hl.mergeContiguoustrue/str
 str name=facet.limit1000/str
 str name=hltrue/str
 str name=fl ELECTRONIC_ACCESS_display ISBN_display TITLE_boost
 FORMAT_display score MEDIA_TYPE_display AUTHOR_boost LOCALURL_display
 UPC_display id DOC_ID_display CHILD_SITE_display DS_EC
 PRIMARY_AUTHOR_boost PRIMARY_TITLE_boost DS_ID TOPIC_display
 ASSET_NAME_display OCLC_display/str
 str
 name=shardslocalhost:8983/solr/SD_ILS/,localhost:8983/solr/SD_ASSET/
 /str
 arr name=facet.field
   strAUTHOR_facet/str
   strFORMAT_facet/str
   strLANGUAGE_facet/str
   strPUBDATE_nfacet/str
   strSUBJECT_facet/str
   strABCDEF_cfacet/str
 /arr
 str name=qtspellcheckedStandard/str
 arr name=fq
   strACCESS_LEVEL_nfacet:0/str
   strCLEARANCE_nfacet:0/str
   strNEED_TO_KNOWS_facet:@@EMPTY@@/str
   strCITIZENSHIPS_facet:@@EMPTY@@/str
   strRESTRICTIONS_facet:@@EMPTY@@/str
 /arr
 str name=facet.mincount1/str
 str name=indenttrue/str
 str name=hl.fl*/str
 str name=rows12/str
 str name=hl.snippets5/str
 str name=start0/str
 str name=qTITLE_boost:kljhklsdjahfkljsdhf book rck~100^200.0
 OR PRIMARY_AUTHOR_boost:kljhklsdjahfkljsdhf book rck~100^100.0 OR
 DOC_TEXT:kljhklsdjahfkljsdhf book rck~100^2 OR
 PRIMARY_TITLE_boost:kljhklsdjahfkljsdhf book rck~100^1000.0 OR
 AUTHOR_boost:kljhklsdjahfkljsdhf book rck~100^20.0 OR
 textFuzzy:kljhklsdjahfkljsdhf~0.7 AND textFuzzy:book~0.7 AND
 textFuzzy:rck~0.7/str
   /lst
 /lst
 result name=response numFound=0 start=0 maxScore=0.0/
 lst name=facet_counts
   lst name=facet_queries/
   lst name=facet_fields
 lst name=AUTHOR_facet/
 lst name=FORMAT_facet/
 lst name=LANGUAGE_facet/
 lst name=PUBDATE_nfacet/
 lst name=SUBJECT_facet/
 lst name=ABCDEF_cfacet/
   /lst
   lst name=facet_dates/
   lst name=facet_ranges/
 /lst
 lst name=highlighting/
 lst name=spellcheck
   lst name=suggestions
 lst name=rck
   int name=numFound5/int
   int name=startOffset362/int
   int name=endOffset365/int
   int name=origFreq0/int
   arr name=suggestion
 lst
   str name=wordrock/str
   int name=freq24000/int
 /lst
 lst
   str name=wordrick/str
   int name=freq6048/int
 /lst
 lst
   str name=wordrack/str
   int name=freq84/int
 /lst
 lst
   str name=wordreck/str
   int name=freq78/int
 /lst
 lst
   str name=wordruck/str
   int name=freq30/int
 /lst
   /arr
 /lst
 bool name=correctlySpelledfalse/bool
   /lst
 /lst
 /response
 
 
 
 Thanks!
 
 Bob Sandiford | Lead Software Engineer | SirsiDynix
 P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
 www.sirsidynix.com




Solr - upgrade from 1.4.1 to 3.1 - finding AbstractSolrTestCase binaries - help please?

2011-04-20 Thread Bob Sandiford
HI, all.

I'm working on upgrading from 1.4.1 to 3.1, and I'm having some troubles with 
some of the unit test code for our custom Filters.  We wrote the tests to 
extend AbstractSolrTestCase, and I've been reading the thread about the 
test-harness elements not being present in the 3.1 distributables. [1]

So, I have checked out the 3.1 branch code and built that (ant 
generate-maven-artifacts), and I've found the 
lucene-test-framework-3.1-xxx.jar(s).  However, these contain only the lucene 
level framework elements, and none of the solr.

Did the solr test framework actually get built and embedded in one of the solr 
jars somewhere?  Or, if not, is there some way to build a jar that contains the 
solr portion of the test harnesses?

[1] SOLR-2061https://issues.apache.org/jira/browse/SOLR-2061 Generate jar 
containing test classes.https://issues.apache.org/jira/browse/SOLR-2061
*
Thanks!

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/


RE: Understanding multi-field queries with q and fq

2011-03-02 Thread Bob Sandiford
Have you looked at the 'qf' parameter?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
_
http://www.cosugi.org/ 




 -Original Message-
 From: mrw [mailto:mikerobertsw...@gmail.com]
 Sent: Wednesday, March 02, 2011 2:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Understanding multi-field queries with q and fq
 
 Anyone understand how to do boolean logic across multiple fields?
 
 Dismax is nice for searching multiple fields, but doesn't necessarily
 support our syntax requirements. eDismax appears to be not available
 until
 Solr 3.1.
 
 In the meantime, it looks like we need to support applying the user's
 query
 to multiple fields, so if the user enters led zeppelin merle we need
 to be
 able to do the logical equivalent of
 
 fq=field1:led zeppelin merle OR field2:led zeppelin merle
 
 
 Any ideas?  :)
 
 
 
 mrw wrote:
 
  After searching this list, Google, and looking through the Pugh book,
 I am
  a little confused about the right way to structure a query.
 
  The Packt book uses the example of the MusicBrainz DB full of song
  metadata.  What if they also had the song lyrics in English and
 German as
  files on disk, and wanted to index them along with the metadata, so
 that
  each document would basically have song title, artist, publisher,
 date,
  ..., All_Metadata (copy field of all metadata fields), Text_English,
 and
  Text_German fields?
 
  There can only be one default field, correct?  So if we want to
 search for
  all songs containing (zeppelin AND (dog OR merle)) do we
 
  repeat the entire query text for all three major fields in the 'q'
 clause
  (assuming we don't want to use the cache):
 
  q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin
 AND
  (dog OR merle)+Text_German:(zeppelin AND (dog OR merle))
 
  or repeat the entire query text for all three major fields in the
 'fq'
  clause (assuming we want to use the cache):
 
  q=*:*fq=(+All_Metadata:zeppelin AND (dog OR
 merle)+Text_English:zeppelin
  AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle))
 
  ?
 
  Thanks!
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-
 with-q-and-fq-tp2528866p2619700.html
 Sent from the Solr - User mailing list archive at Nabble.com.




RE: Solr multi cores or not

2011-02-16 Thread Bob Sandiford
Hmmm.  Maybe I'm not understanding what you're getting at, Jonathan, when you 
say 'There is no good way in Solr to run a query across multiple Solr indexes'.

What about the 'shards' parameter?  That allows searching across multiple cores 
in the same instance, or shards across multiple instances.

There are certainly implications here (like Relevance not being consistent 
across cores / shards), but it works pretty well for us...

Thanks!

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 



 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Wednesday, February 16, 2011 4:09 PM
 To: solr-user@lucene.apache.org
 Cc: Thumuluri, Sai
 Subject: Re: Solr multi cores or not
 
 Solr multi-core essentially just lets you run multiple seperate
 distinct
 Solr indexes in the same running Solr instance.
 
 It does NOT let you run queries accross multiple cores at once. The
 cores are just like completely seperate Solr indexes, they are just
 conveniently running in the same Solr instance. (Which can be easier
 and
 more compact to set up than actually setting up seperate Solr
 instances.
 And they can share some config more easily. And it _may_ have
 implications on JVM usage, not sure).
 
 There is no good way in Solr to run a query accross multiple Solr
 indexes, whether they are multi-core or single cores in seperate Solr
 doesn't matter.
 
 Your first approach should be to try and put all the data in one Solr
 index. (one Solr 'core').
 
 Jonathan
 
 On 2/16/2011 3:45 PM, Thumuluri, Sai wrote:
  Hi,
 
  I have a need to index multiple applications using Solr, I also have
 the
  need to share indexes or run a search query across these application
  indexes. Is solr multi-core - the way to go?  My server config is
  2virtual CPUs @ 1.8 GHz and has about 32GB of memory. What is the
  recommendation?
 
  Thanks,
  Sai Thumuluri
 
 
 




RE: Using terms and N-gram

2011-02-03 Thread Bob Sandiford
I don't suppose it's something silly like the fact that your indexing chain 
includes 'words=stopwords.txt', and your query chain does not?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
_
Early COSUGI birds get the worm! 
Register by 15 February and get a one time viewing of the three course 
Circulation Basics self-paced training suite.
http://www.cosugi.org/ 




 -Original Message-
 From: openvictor Open [mailto:openvic...@gmail.com]
 Sent: Thursday, February 03, 2011 12:02 AM
 To: solr-user@lucene.apache.org
 Subject: Using terms and N-gram
 
 Dear all,
 
 I am trying to implement an autocomplete system for research. But I am
 stuck
 on some problems that I can't solve.
 
 Here is my problem :
 I give text like :
 the cat is black and I want to explore all 1 gram to 8 gram for all
 the
 text that are passed :
 the, cat, is, black, the cat, cat is, is black, etc...
 
 In order to do that I have defined the following fieldtype in my schema
 :
 
 !--Custom fieldtype--
 fieldType name=ngram_field class=solr.TextField
   analyzer type=index
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.CommonGramsFilterFactory words=stopwords.txt
 ignoreCase=true maxGramSize=8
minGramSize=1/
   /analyzer
   analyzer type=query
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.CommonGramsFilterFactory ignoreCase=true
 maxGramSize=8
minGramSize=1/
   /analyzer
 /fieldType
 
 
 Then the following field :
 
 field name=p_title_ngram type=ngram_field indexed=true
 stored=true/
 
 Then I feed solr with some phrases and I was really surprised to see
 that
 Solr didn't behave as expected.
 I went to the schema browser to see the result for the very profound
 query :
 the cat is black and it rains
 
 The results are quite deceiving : first 1 grams are not found. some 2
 grams
 are found like : the_cat, and_it etc... But not what I expected.
 Is there something I am missing here ? (by the way I also tried to
 remove
 the mingramsize and maxgramsize even the words).
 
 Thank you,
 Victor Kabdebon



RE: match count per shard and across shards

2011-01-29 Thread Bob Sandiford
Or - you could add a standard field to each shard, populate with a distinct 
value for each shard, and facet on that field.  Then look at the facet counts 
of the value that corresponds to a shard, and, hey-presto, you're done...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


 -Original Message-
 From: Upayavira [mailto:u...@odoko.co.uk]
 Sent: Saturday, January 29, 2011 6:52 PM
 To: solr-user@lucene.apache.org
 Subject: Re: match count per shard and across shards
 
 To my knowledge, the distributed search functionality is intended to be
 transparent, thus no details deriving from it are exposed (e.g. what
 docs come from which shard), so, no, I don't believe it to be possible.
 
 The only way I know right now that you could achieve it is by two (sets
 of) queries. One would be a distributed search across all shards, and
 the other would be a single hit to every shard. To fake such a facet,
 this second set of queries would only need to ask for totals, so it
 could use a rows=0.
 
 Otherwise you'd have to enhance the distributed indexing code to expose
 some of this information in its response.
 
 Upayavira
 
 On Sat, 29 Jan 2011 03:48 -0800, csj christiansonnejen...@gmail.com
 wrote:
 
  Hi,
 
  Is it possible to construct a Solr query that will return the total
  number
  of hits there across all shards, and at the same time getting the
 number
  of
  hits per shard?
 
  I was thinking along the lines of a faceted search, but I'm not deep
  enough
  into Solr capabilities and query parameters to figure it out.
 
  Regards,
 
  Christian Sonne Jensen
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-
 shards-tp2369627p2369627.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 ---
 Enterprise Search Consultant at Sourcesense UK,
 Making Sense of Open Source
 




RE: Will Result Grouping return documents that don't contain the specified group.field?

2011-01-06 Thread Bob Sandiford
What if you put in a default value for the group_id field in the solr schema - 
would that work for you?  e.g. something like 'unknown'  Then you'll get all 
those with no original group_id value still grouped together, and you can 
figure out at display time what you want to do with them.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


 -Original Message-
 From: Andy [mailto:angelf...@yahoo.com]
 Sent: Thursday, January 06, 2011 3:06 PM
 To: solr-user@lucene.apache.org
 Subject: Will Result Grouping return documents that don't contain the
 specified group.field?
 
 I want to group my results by a field named group_id.
 
 However, some of my documents don't contain the field group_id. But I
 still want these documents to be returned as part of the results as
 long as they match the main query q.
 
 Do I need to do anything to tell Solr that I want those documents?
 
 Thanks.
 
 
 




Special Parent / Child relationship - advice / observations welcome on how to approach this

2010-11-23 Thread Bob Sandiford
 to 
dive into the Solr / Lucene code if that's what it will take - I'd just like an 
indication of what people think would be a good / possible approach before I 
get into that level...  e.g. some way of providing to the Indexer a tuple of 
each found combination of the 5 values, and then doing something (what?) with 
searching for the facet queries

Thanks!


Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/



RE: Empty value/string matching

2010-11-22 Thread Bob Sandiford
One possibility to consider - if you really need documents with specifically 
empty or non-defined values (if that's not an oxymoron :)), and you have 
control over the values you send into the indexing, you could set a special 
value that means 'no value'. We've done that in a similar vein, using something 
like '@@EMPTY@@' for a given field, meaning that the original document didn't 
actually have a value for that field.  I.E. it is something very unlikely to be 
a 'real' value - and then we can easily select on documents by querying for the 
field:@@EMPTY@@ instead of the negated form of the select...  However, we 
haven't considered things like what it does to index size.  It's relatively 
rare for us (that there not be a value), so our 'gut feel' is that it's not 
impacting the indexes very much size-wise or performance-wise.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 

 -Original Message-
 From: Viswa S [mailto:svis...@hotmail.com]
 Sent: Saturday, November 20, 2010 5:38 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Empty value/string matching
 
 
 Erick,
 Thanks for the quick response. The output i showed is on a test
 instance i created to simulate this issue. I intentionally tried to
 create documents with no values by creating xml nodes with field
 name=fieldName/field, but having values in the other fields in a
 document.
 Are you saying that there is no way have a field with no value?, with
 text fields they seem to make sense than for string?.
 You are right on fieldName:[* TO *] results, which basically returned
 all the documents which included the couple of documents in question.
 -Viswa
  Date: Sat, 20 Nov 2010 17:20:53 -0500
  Subject: Re: Empty value/string matching
  From: erickerick...@gmail.com
  To: solr-user@lucene.apache.org
 
  I don't think that's correct. The documents wouldn't be showing
  up in the facets if they had no value for the field. So I think
 you're
  being mislead by the printout from the faceting. Perhaps you
  have unprintable characters in there or some such. Certainly the
  name:  is actually a value, admittedly just a space. As for the
  other, I suspect something similar.
 
  What results do you get back when you just search for
  FieldName:[* TO *]? I'm betting you get all the docs back,
  but I've been very wrong before.
 
  Best
  Erick
 
  On Sat, Nov 20, 2010 at 5:02 PM, Viswa S svis...@hotmail.com wrote:
 
  
   Yes I do have a couple of documents with no values and one with an
 empty
   string. Find below the output of a facet on the fieldName.
   ThanksViswa
  
  
   int name=2/intint name=CASTIGO.4302/intint
   name=GDOGPRODY.4242/intint name=QMAGIC.4122/intint
 name=
   1/int
Date: Sat, 20 Nov 2010 15:29:06 -0500
Subject: Re: Empty value/string matching
From: erickerick...@gmail.com
To: solr-user@lucene.apache.org
   
Are you absolutely sure your documents really don't have any
 values for
FieldName? Because your results are perfectly correct if every
 doc has
   a
value for FieldName.
   
Or are you saying there no such field as FieldName?
   
Best
Erick
   
On Sat, Nov 20, 2010 at 3:12 PM, Viswa S svis...@hotmail.com
 wrote:
   

 Folks,Am trying to query documents which have no values
 present, I have
 used the following constructs and it doesn't seem to work on
 the solr
   dev
 tip (as of 09/22) or the 1.4 builds.1. (*:* AND -FieldName[* TO
 *]) -
 returns no documents, parsedquery was +MatchAllDocsQuery(*:*)
   -FieldName:[*
 TO *]2. -FieldName:[* TO *] -  returns no documents,
 parsedquery was
 -FieldName:[* TO *]3. FieldName: - returns no documents,
   parsedquery was
 empty (str name=parsedquery/)The field is type string,
 using the
 LuceneQParser, I have also tried to see if FieldName:[* TO *]
 if the
 documents with no terms are ignored and didn't seem to be the
 case, the
 result set was everything.Any help would be appreciated.-Viswa

  
  
 



RE: Dynamic creating of cores in solr

2010-11-10 Thread Bob Sandiford
();
}
}


And that's about it.

You could adjust the above so there's only one core per index that you want - 
if you don't do complete reindexes, and don't need the index to always be 
searchable.

Hope that helps...


Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


 -Original Message-
 From: Nizan Grauer [mailto:niz...@yahoo-inc.com]
 Sent: Tuesday, November 09, 2010 3:36 AM
 To: solr-user@lucene.apache.org
 Subject: Dynamic creating of cores in solr
 
 Hi,
 
 I'm not sure this is the right mail to write to, hopefully you can help
 or direct me to the right person
 
 I'm using solr - one master with 17 slaves in the server and using
 solrj as the java client
 
 Currently there's only one core in all of them (master and slaves) -
 only the cpaCore.
 
 I thought about using multi-cores solr, but I have some problems with
 that.
 
 I don't know in advance which cores I'd need -
 
 When my java program runs, I call for documents to be index to a
 certain url, which contains the core name, and I might create a url
 based on core that is not yet created. For example:
 
 Calling to index - http://localhost:8080/cpaCore  - existing core,
 everything as usual
 Calling to index -  http://localhost:8080/newCore - server realizes
 there's no core newCore, creates it and indexes to it. After that -
 also creates the new core in the slaves
 Calling to index - http://localhost:8080/newCore  - existing core,
 everything as usual
 
 What I'd like to have on the server side to do is realize by itself if
 the cores exists or not, and if not  - create it
 
 One other restriction - I can't change anything in the client side -
 calling to the server can only make the calls it's doing now - for
 index and search, and cannot make calls for cores creation via the
 CoreAdminHandler. All I can do is something in the server itself
 
 What can I do to get it done? Write some RequestHandler?
 REquestProcessor? Any other option?
 
 Thanks, nizan



RE: Dynamic creating of cores in solr

2010-11-10 Thread Bob Sandiford
Why not use replication?  Call it inexperience...

We're really early into working with and fully understanding Solr and the best 
way to approach various issues.  I did mention that this was a prototype and 
non-production code, so I'm covered, though :)

We'll take a look at the replication feature...

Thanks!

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Wednesday, November 10, 2010 3:26 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Dynamic creating of cores in solr
 
 You could use the actual built-in Solr replication feature to
 accomplish
 that same function -- complete re-index to a 'master', and then when
 finished, trigger replication to the 'slave', with the 'slave' being
 the
 live index that actually serves your applications.
 
 I am curious if there was any reason you chose to roll your own
 solution
 using JSolr and dynamic creation of cores, instead of simply using the
 replication feature. Were there any downsides of using the replication
 feature for this purpose that you amerliorated through your solution?
 
 Jonathan
 
 Bob Sandiford wrote:
  We also use SolrJ, and have a dynamically created Core capability -
 where we don't know in advance what the Cores will be that we require.
 
  We almost always do a complete index build, and if there's a previous
 instance of that index, it needs to be available during a complete
 index build, so we have two cores per index, and switch them as
 required at the end of an indexing run.
 
  Here's a summary of how we do it (we're in an early prototype /
 implementation right now - this isn't  production quality code - as you
 can tell from our voluminous javadocs on the methods...)
 
  1) Identify if the core exists, and if not, create it:
 
 /**
   * This method instantiates two SolrServer objects, solr and
 indexCore.  It requires that
   * indexName be set before calling.
   */
  private void initSolrServer() throws IOException
  {
  String baseUrl = http://localhost:8983/solr/;;
  solr = new CommonsHttpSolrServer(baseUrl);
 
  String indexCoreName = indexName +
 SolrConstants.SUFFIX_INDEX; // SUFIX_INDEX = _INDEX
  String indexCoreUrl = baseUrl + indexCoreName;
 
  // Here we create two cores for the indexName, if they don't
 already exist - the live core used
  // for searching and a second core used for indexing. After
 indexing, the two will be switched so the
  // just-indexed core will become the live core. The way that
 core swapping works, the live core will always
  // be named [indexName] and the indexing core will always be
 named [indexname]_INDEX, but the
  // dataDir of each core will alternate between [indexName]_1
 and [indexName]_2.
  createCoreIfNeeded(indexName, indexName + _1, solr);
  createCoreIfNeeded(indexCoreName, indexName + _2, solr);
  indexCore = new CommonsHttpSolrServer(indexCoreUrl);
  }
 
 
 /**
   * Create a core if it does not already exists. Returns true if a
 new core was created, false otherwise.
   */
  private boolean createCoreIfNeeded(String coreName, String
 dataDir, SolrServer server) throws IOException
  {
  boolean coreExists = true;
  try
  {
  // SolrJ provides no direct method to check if a core
 exists, but getStatus will
  // return an empty list for any core that doesn't.
  CoreAdminResponse statusResponse =
 CoreAdminRequest.getStatus(coreName, server);
  coreExists =
 statusResponse.getCoreStatus(coreName).size()  0;
  if(!coreExists)
  {
  // Create the core
  LOG.info(Creating Solr core:  + coreName);
  CoreAdminRequest.Create create = new
 CoreAdminRequest.Create();
  create.setCoreName(coreName);
  create.setInstanceDir(.);
  create.setDataDir(dataDir);
  create.process(server);
  }
  }
  catch (SolrServerException e)
  {
  e.printStackTrace();
  }
  return !coreExists;
  }
 
 
  2) Do the index, clearing it first if it's a complete rebuild:
 
  [snip]
  if (fullIndex)
  {
  try
  {
  indexCore.deleteByQuery(*:*);
  }
  catch (SolrServerException e)
  {
  e.printStackTrace();  //To change body of catch
 statement use File | Settings | File Templates.
  }
  }
  [snip]
 
  various logic, then (we submit batches of 100 :
 
  [snip]
  ListSolrInputDocument docList =
 b.getSolrInputDocumentList();
UpdateResponse rsp;
  try

RE: Facet showing MORE results than expected when its selected?

2010-11-10 Thread Bob Sandiford
Shouldn't the second query have the clause:

   fq=themes_raw:Hotel en Restaurant

instead of:

   fq=themes:Hotel en Restaurant

Otherwise you're mixing apples (themes_raw) and oranges (themes).

(Notice how I cleverly extended the restaurant theme to be food related :))

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


 -Original Message-
 From: PeterKerk [mailto:vettepa...@hotmail.com]
 Sent: Wednesday, November 10, 2010 4:34 PM
 To: solr-user@lucene.apache.org
 Subject: Facet showing MORE results than expected when its selected?
 
 
 A facet shows the amount of results that match with that facet, e.g.
 New
 York (433) So when the facet is clicked, you'd expect that amount of
 results (433).
 
 However, I have a facet Hotel en Restaurant (321), that, when clicked
 shows 370 results! :s
 
 
 1st query:
 http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=
 0rows=25fl=id,title,themesfacet.field=themes_rawfacet.mincount=1
 
 
 This is (part) of the resultset of my first query
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=themes_raw
   int name=Hotel en Restaurant321/int
 /lst
 /lst
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 
 
 
 Now when I click the facet Hotel en Restaurant,
 it fires my second query:
 http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:Ho
 tel
 en
 Restaurantq=*:*start=0rows=25fl=id,title,themesfacet.field=themes_
 rawfacet.mincount=1
 
 I would expect 321, however I get 370!
 
 
 schema.xml
 field name=themes type=text indexed=true stored=true
 multiValued=true  /
 field name=themes_raw type=string indexed=true stored=true
 multiValued=true/
 copyField source=themes dest=themes_raw/
 --
 View this message in context: http://lucene.472066.n3.nabble.com/Facet-
 showing-MORE-results-than-expected-when-its-selected-
 tp1878828p1878828.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Natural string sorting

2010-10-29 Thread Bob Sandiford
Well, you could do a magnitude notation approach.  Depends on how complex the 
strings are, but based on your examples, this would work:

1) Identify a series of integers in the string.  (This assumes lengths are no 
more than 9 for each series).

2) Insert the number of integers into the string before the integer series 
itself

So - for sorting - you would have:

string1 -- string11
string10 -- string210
string2 -- string12

which will then sort as string11, string12, string210, but use the original 
strings as the displays you want.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 

 -Original Message-
 From: Savvas-Andreas Moysidis
 [mailto:savvas.andreas.moysi...@googlemail.com]
 Sent: Friday, October 29, 2010 4:33 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Natural string sorting
 
 I think string10 is before string2 in lexicographic order?
 
 On 29 October 2010 09:18, RL rl.subscri...@gmail.com wrote:
 
 
  Just a quick question about natural sorting of strings.
 
  I've a simple dynamic field in my schema:
 
  fieldType name=string class=solr.StrField sortMissingLast=true
  omitNorms=true/
  field name=nameSort_en type=string indexed=true stored=false
  omitNorms=true/
 
  There are 3 indexed strings for example
  string1,string2,string10
 
  Executing a query and sorting by this field leads to unnatural
 sorting of :
  string1
  string10
  string2
 
  (Some time ago i used Lucene and i was pretty sure that Lucene used a
  natural sort, thus i expected the same from solr)
  Is there a way to sort in a natural order? Config option? Plugin?
 Expected
  output would be:
  string1
  string2
  string10
 
 
  Thanks in advance.
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Natural-string-sorting-
 tp1791227p1791227.html
  Sent from the Solr - User mailing list archive at Nabble.com.