Logic behind Solr creating files in .../data/index path.

2010-09-07 Thread rajini maski
All,

While we post data to Solr... The data get stored in   //data/index  path
in some multiple files with different file extensions...
Not worrying about the extensions, I want to know how are these number of
files created ?
Does anyone know on what logic are these multiple index files  created in
data/index  path ... ? If we do an optimize , The number of files get
reduced...
Else, say some N number of files are  created.. Based on what parameter it
creates? And how are the sizes of file varies there?


Hope I am clear about the doubt I have...


Re: Logic behind Solr creating files in .../data/index path.

2010-09-07 Thread Ryan McKinley
Check:
http://lucene.apache.org/java/3_0_2/fileformats.html


On Tue, Sep 7, 2010 at 3:16 AM, rajini maski rajinima...@gmail.com wrote:
 All,

 While we post data to Solr... The data get stored in   //data/index  path
 in some multiple files with different file extensions...
 Not worrying about the extensions, I want to know how are these number of
 files created ?
 Does anyone know on what logic are these multiple index files  created in
 data/index  path ... ? If we do an optimize , The number of files get
 reduced...
 Else, say some N number of files are  created.. Based on what parameter it
 creates? And how are the sizes of file varies there?


 Hope I am clear about the doubt I have...



Nutch/Solr

2010-09-07 Thread Yavuz Selim YILMAZ
I tried to combine nutch and solr, want to ask somethig.

After crawling, nutch has certain fields such as; content, tstamp, title.

How can I map content field after crawling ? Do I have change the lucene
code (such as add extra field)?

Or overcome in solr stage?

Any suggestion?

Thx.
--

Yavuz Selim YILMAZ


Re: Nutch/Solr

2010-09-07 Thread Markus Jelsma
Depends on your version of Nutch. At least trunk and 1.1 obey the 
solrmapping.xml file in Nutch' configuration directory. I'd suggest you start 
with that mapping file and the Solr schema.xml file shipped with Nutch as it 
exactly matches with the mapping file.

Just restart Solr with the new schema (or you change the mapping), crawl, 
fetch, parse and update your DB's and then push the index from Nutch to your 
Solr instance.


On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote:
 I tried to combine nutch and solr, want to ask somethig.
 
 After crawling, nutch has certain fields such as; content, tstamp, title.
 
 How can I map content field after crawling ? Do I have change the lucene
 code (such as add extra field)?
 
 Or overcome in solr stage?
 
 Any suggestion?
 
 Thx.
 --
 
 Yavuz Selim YILMAZ
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Null pointer exception when mixing highlighter shards q.alt

2010-09-07 Thread Marc Sturlese

I noticed that long ago.
Fixed it doing in HighlightComponent finishStage:
  @Override
  public void finishStage(ResponseBuilder rb) {
boolean hasHighlighting = true ;
if (rb.doHighlights  rb.stage == ResponseBuilder.STAGE_GET_FIELDS) {

  Map.EntryString, Object[] arr = new
NamedList.NamedListEntry[rb.resultIds.size()];

  // TODO: make a generic routine to do automatic merging of id keyed
data
  for (ShardRequest sreq : rb.finished) {
if ((sreq.purpose  ShardRequest.PURPOSE_GET_HIGHLIGHTS) == 0)
continue;
for (ShardResponse srsp : sreq.responses) {
  NamedList hl =
(NamedList)srsp.getSolrResponse().getResponse().get(highlighting);
  //patch bug
  if(hl != null) {
for (int i=0; ihl.size(); i++) {
 String id = hl.getName(i);
 ShardDoc sdoc = rb.resultIds.get(id);
 int idx = sdoc.positionInResponse;
 arr[idx] = new NamedList.NamedListEntry(id, hl.getVal(i));
}
  } else {
hasHighlighting = false;
  }
}
  }

  // remove nulls in case not all docs were able to be retrieved
  //patch bug
  if(hasHighlighting) {
rb.rsp.add(highlighting, removeNulls(new SimpleOrderedMap(arr)));
  }
}
  }
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Null-pointer-exception-when-mixing-highlighter-shards-q-alt-tp1430353p1431253.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Nutch/Solr

2010-09-07 Thread Yavuz Selim YILMAZ
In fact, I used nutch 0.9 version, but thinking of passing the new version.

If anybody did something like that, ı want to learn their experience.

If indexing an xml file, there are specific fields and all of them are
dependent among them, so duplicates don't happen.

I want to extract specific fields from the content field. Doing such
extraction, new fileds should be indexed as well, then comes me that,
content indexed twice for every new field.

By the way, any details about how to get new fields from the content will be
helpful.
--

Yavuz Selim YILMAZ


2010/9/7 Markus Jelsma markus.jel...@buyways.nl

 Depends on your version of Nutch. At least trunk and 1.1 obey the
 solrmapping.xml file in Nutch' configuration directory. I'd suggest you
 start
 with that mapping file and the Solr schema.xml file shipped with Nutch as
 it
 exactly matches with the mapping file.

 Just restart Solr with the new schema (or you change the mapping), crawl,
 fetch, parse and update your DB's and then push the index from Nutch to
 your
 Solr instance.


 On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote:
  I tried to combine nutch and solr, want to ask somethig.
 
  After crawling, nutch has certain fields such as; content, tstamp, title.
 
  How can I map content field after crawling ? Do I have change the
 lucene
  code (such as add extra field)?
 
  Or overcome in solr stage?
 
  Any suggestion?
 
  Thx.
  --
 
  Yavuz Selim YILMAZ
 

 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350




Re: Nutch/Solr

2010-09-07 Thread Markus Jelsma

You should:
- definately upgrade to 1.1 (1.2 is on the way), and
- subscribe to the Nutch mailing list for Nutch specific questions. 


On Tuesday 07 September 2010 10:36:58 Yavuz Selim YILMAZ wrote:
 In fact, I used nutch 0.9 version, but thinking of passing the new version.
 
 If anybody did something like that, ? want to learn their experience.
 
 If indexing an xml file, there are specific fields and all of them are
 dependent among them, so duplicates don't happen.
 
 I want to extract specific fields from the content field. Doing such
 extraction, new fileds should be indexed as well, then comes me that,
 content indexed twice for every new field.
 
 By the way, any details about how to get new fields from the content will
  be helpful.
 --
 
 Yavuz Selim YILMAZ
 
 
 2010/9/7 Markus Jelsma markus.jel...@buyways.nl
 
  Depends on your version of Nutch. At least trunk and 1.1 obey the
  solrmapping.xml file in Nutch' configuration directory. I'd suggest you
  start
  with that mapping file and the Solr schema.xml file shipped with Nutch as
  it
  exactly matches with the mapping file.
 
  Just restart Solr with the new schema (or you change the mapping), crawl,
  fetch, parse and update your DB's and then push the index from Nutch to
  your
  Solr instance.
 
  On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote:
   I tried to combine nutch and solr, want to ask somethig.
  
   After crawling, nutch has certain fields such as; content, tstamp,
   title.
  
   How can I map content field after crawling ? Do I have change the
 
  lucene
 
   code (such as add extra field)?
  
   Or overcome in solr stage?
  
   Any suggestion?
  
   Thx.
   --
  
   Yavuz Selim YILMAZ
 
  Markus Jelsma - Technisch Architect - Buyways BV
  http://www.linkedin.com/in/markus17
  050-8536620 / 06-50258350
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Query result ranking - Score independent

2010-09-07 Thread Alessandro Benedetti
Hi all,
I need to retrieve query-results with a ranking independent from each
query-result's default lucene score, which means assigning the same score to
each query result.
I tried to use a zero boost factor ( ^0 ) to reset to zero each
query-result's score.
This strategy seems to work within the example solr instance, but in my
Solr instance, using a zero boost factor causes a Buffer Exception
(
HTTP Status 500 - null java.lang.IllegalArgumentException at
java.nio.Buffer.limit(Buffer.java:249) at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123)
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at
org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at
org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at
org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506)
at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
)
Do you know any other technique to reset to some fixed constant value, all
the query-result's scores?
Each query result should obtain the same score.
Any suggestion?

Thx

-- 
--

Benedetti Alessandro
Personal Page: http://tigerbolt.altervista.org

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Alphanumeric wildcard search problem

2010-09-07 Thread Erick Erickson
Thanks for letting us know. What was the magic? I'm still unclear
what was different between my tests and your implementation, mysteries
like this make me nervous G..

Thanks
Erick

On Mon, Sep 6, 2010 at 5:45 PM, Hasnain hasn...@hotmail.com wrote:


 Finally got it working, thanks for your help and support
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Alphanumeric-wildcard-search-problem-tp1393332p1429315.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Expanded Synonyms + phrase search

2010-09-07 Thread Jak Akdemir
Did you check ../admin/analysis.jsp page to see how index and query analyzer
behaved?
In usual, when you add parti socialiste to synonyms-fr.txt, it would
response correctly both of PS et and parti socialiste queries.

On Mon, Aug 30, 2010 at 4:55 PM, Xavier Schepler 
xavier.schep...@sciences-po.fr wrote:

 Hi,

 several documents from my index contain the phrase : PS et.
 However, PS is expanded to parti socialiste and a phrase search for PS
 et fails.
 A phrase search for parti socialiste et succeeds.

 Can I have both queries working ?


 Here's the field type :

   fieldtype name=SyFR class=solr.TextField
 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StandardFilterFactory/
   !-- Synonyms --
   filter class=solr.SynonymFilterFactory synonyms=synonyms-fr.txt
 ignoreCase=true expand=true/
   filter class=solr.LowerCaseFilterFactory/
   charFilter class=solr.MappingCharFilterFactory
 mapping=mapping-ISOLatin1Accent.txt/
/analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StandardFilterFactory/
   filter class=solr.LowerCaseFilterFactory/
   charFilter class=solr.MappingCharFilterFactory
 mapping=mapping-ISOLatin1Accent.txt/
 /analyzer
   /fieldtype



Re: Null pointer exception when mixing highlighter shards q.alt

2010-09-07 Thread Ron Mayer
Marc Sturlese wrote:
 I noticed that long ago.
 Fixed it doing in HighlightComponent finishStage:
 ...
   public void finishStage(ResponseBuilder rb) {
...
   }

Thanks!   I'll try that


I also seem to have a similar problem with shards + facets -- in particular
it seems like the error occurrs when some of the shards have no values for
some of the facets.

Any chance you (or anyone else) have a fix for that one too?

Here's the backtrace I'm getting from a few day old svn trunk.

Sep 7, 2010 6:03:58 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:340)
at 
org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:301)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


Re: Implementing synonym NewBie

2010-09-07 Thread Jak Akdemir
If you think to improve your synonyms file by time I would recommend you
query time indexing. By the way you don't have to re-index when you need to
add something more.

On Sat, Aug 28, 2010 at 10:01 AM, Jonty Rhods jonty.rh...@gmail.com wrote:

 Hi All,

 I want to use synonym for my search.
 Still I am in learning phase of solr. So please help me to implement
 synonym
 in my search.
 according to wiki synonym can be implemented in two ways.
 1 at index time
 2 at search time]

 I have combination 10 of phrase for synonym so which will be better in my
 case.
 something like : live show in new york=live show in clifornia= live show
 = live show in DC = live show in USA
 is synonym will effect my original search?

 thanks
 with regards
 Jonty



ankita shinde wants to chat

2010-09-07 Thread ankita shinde
---

ankita shinde wants to stay in better touch using some of Google's coolest new
products.

If you already have Gmail or Google Talk, visit:
http://mail.google.com/mail/b-d1bf7a33e2-4d170858b7-C4KO27fMXYsHI1lHg8OOW9Oi-ts
You'll need to click this link to be able to chat with ankita shinde.

To get Gmail - a free email account from Google with over 2,800 megabytes of
storage - and chat with ankita shinde, visit:
http://mail.google.com/mail/a-d1bf7a33e2-4d170858b7-C4KO27fMXYsHI1lHg8OOW9Oi-ts

Gmail offers:
- Instant messaging right inside Gmail
- Powerful spam protection
- Built-in search for finding your messages and a helpful way of organizing
  emails into conversations
- No pop-up ads or untargeted banners - just text ads and related information
  that are relevant to the content of your messages

All this, and its yours for free. But wait, there's more! By opening a Gmail
account, you also get access to Google Talk, Google's instant messaging
service:

http://www.google.com/talk/

Google Talk offers:
- Web-based chat that you can use anywhere, without a download
- A contact list that's synchronized with your Gmail account
- Free, high quality PC-to-PC voice calls when you download the Google Talk
  client

We're working hard to add new features and make improvements, so we might also
ask for your comments and suggestions periodically. We appreciate your help in
making our products even better!

Thanks,
The Google Team

To learn more about Gmail and Google Talk, visit:
http://mail.google.com/mail/help/about.html
http://www.google.com/talk/about.html

(If clicking the URLs in this message does not work, copy and paste them into
the address bar of your browser).


How to extend IndexSchema and SchemaField

2010-09-07 Thread Renaud Delbru

 Hi,

I would like to extend the field node in the schema.xml by adding new 
attributes. For example, I would like to be able to write:

field type=myField myattribute=myvalue/
And be able to access myattribute directly from IndexSchema and 
SchemaField objects. However, these two classes are final, and also not 
very easy to extend ?

Is there any other solutions ?

thanks,
--
Renaud Delbru


Advice requested. How to map 1:M or M:M relationships with support for facets

2010-09-07 Thread Tim Gilbert
Hi guys,

 

Question:

 

What is the best way to create a solr schema which supports a
'multivalue' where the value is a two item array of event category and a
date. I want to have faceted searches, counts and Date Range ability on
both the category and the dates.

 

Details:

 

This is a person database where Person can have details about them (like
address) and Person have many Events.  Events have a category (type of
event) and a Date for when that event occurred.  At the bottom you will
see a simple diagram showing the relationship.  Briefly, a Person has
many Events and Events have a single category and a single person.

 

What I would like to be able to do is:

 

Have a facet which shows all of the event categories, with a 'sub-facet'
that show Category + date.  For example, if a Category was Attended
Conference and date was 2008-09-08, I'd be able to show a count of all
Attended Conference, then have a tree type control and show the years
(for example):

 

Eg.

 

+ Attended Conference (1038)

|

+ 2010 (100)

+--- 2009 (134)

+--- 2008 (234) 

|

+ Another Event Category (23432)

|

+-2010 (234)

+2009 (245)

 

Etc.

 

For scale, I expect to have  100 Event Categories and  a million
person_event records on  250,000 persons.  I don't care very much about
disk space, so if it's a 1 GB or 100 GB due to indexing, that's okay if
the solution works (and its fast! :-))

 

 

Solutions I looked at:

 

*   I looked at poly but they seem to be a fixed length and appeared
to be the same type.  Typical use case was latitude  longitude.  I
don't think this will work because there are a variable number of events
attached to a person.
*   I looked at multiValued but it didn't seem to permit two fields
having a relationship, ie. Event Category  Event Date.  It seemed to me
that they need to be broken out.  That's not necessarily a bad thing,
but it didn't seem ideal.
*   I thought about concatenating category  date to create a fake
fields strictly for faceting purposes, but I believe that will break
date ranges.  Eg.  EventCategoryId + | + Date  = 1|2009 as a facet
would allow me to show counts for that event type.  Seems a bit unwieldy
to me... 

 

What's the groups advice for handling this situation in the best way?

 

Thanks in advance, as always sorry if this question has been asked and
answered a few times already.  I googled for a few hours before writing
this... but things change so fast with Solr that any article older than
a year was suspect to me, also there are so many patches that provide
additional functionality... 

 

Tim

 

 

 

 

Schema:

 



Re: How to give path in SCRIPT tag?

2010-09-07 Thread Simon Willnauer
ankita,

your questions seems to be somewhat unrelated to solr / lucene and
should be asked somewhere else but not on this list. Please try to
keep the focus of your questions to Solr related topics or use
java-user@ for lucene related topics.

Thanks,

Simon

On Tue, Sep 7, 2010 at 3:46 PM, ankita shinde ankitashinde...@gmail.com wrote:
 How to give path of folder stored on our local machine in Script tag  'src'
 attribute in html file,head tag.

 Is this is correct ?

  script type=text/javascript
 src=C:/evol/core/AbstractManager.js/script



RE: solr user

2010-09-07 Thread Dave Searle
You probably need to use the file:// moniker - if using firefox, install 
firebug and use the net panel to see if the includes load

-Original Message-
From: ankita shinde [mailto:ankitashinde...@gmail.com] 
Sent: 07 September 2010 18:22
To: solr-user@lucene.apache.org
Subject: solr user

hello all,

   I am working with Ajax solr.I m trying to send request to solr to
retrieve all XML documents.I have created one folder named source which is
in C drive.Source folder contains all the .js files.I have tried following
code but its giving error as AjaxSolr is not defined.
Can anyone pleas guide me




!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN 
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
html xmlns=http://www.w3.org/1999/xhtml; xml:lang=en lang=en
head
  titleAJAX Solr/title

  link rel=stylesheet type=text/css href=css/reuters.css
media=screen /


  script type=text/javascript
src=C:/source/AbstractManager.js/script
  script type=text/javascript src=C:/source/Manager.jquery.js/script
  script type=text/javascript src=C:/source/Parameter.js/script
  script type=text/javascript src=C:/source/ParameterStore.js/script
  script type=text/javascript src=C:/source/AbstractWidget.js/script
  script type=text/javascript src=C:/source/ResultWidget.2.js/script

  script type=text/javascript src=thm.2.js/script

  script type=text/javascript src=jquery.min.js/script
  script type=text/javascript src=retuers.js/script

  script type=text/javascript src=C:/source/Core.js/script

/head
body
  div id=wrap
div id=header
  h1AJAX Solr Demonstration/h1
  h2Browse Reuters business news from 1987/h2
/div

div class=right
  div id=result
div id=navigation
  ul id=pager/ul
  div id=pager-header/div
/div
div id=docs/div
  /div
/div

div class=left
  h2Current Selection/h2
  ul id=selection/ul

  h2Search/h2
  span id=search_help(press ESC to close suggestions)/span
  ul id=search
input type=text id=query name=query/
  /ul

  h2Top Topics/h2
  div class=tagcloud id=topics/div

  h2Top Organisations/h2
  div class=tagcloud id=organisations/div

  h2Top Exchanges/h2
  div class=tagcloud id=exchanges/div

  h2By Country/h2
  div id=countries/div
  div id=preview/div

  h2By Date/h2
  div id=calendar/div

  div class=clear/div
/div
div class=clear/div
  /div
/body
/html


Re: Query result ranking - Score independent

2010-09-07 Thread Grant Ingersoll

On Sep 7, 2010, at 7:08 AM, Alessandro Benedetti wrote:

 Hi all,
 I need to retrieve query-results with a ranking independent from each
 query-result's default lucene score, which means assigning the same score to
 each query result.
 I tried to use a zero boost factor ( ^0 ) to reset to zero each
 query-result's score.
 This strategy seems to work within the example solr instance, but in my
 Solr instance, using a zero boost factor causes a Buffer Exception
 (
 HTTP Status 500 - null java.lang.IllegalArgumentException at
 java.nio.Buffer.limit(Buffer.java:249) at
 org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123)
 at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
 at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
 at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at
 org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at
 org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506)
 at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
 )

Hmm, that stack trace doesn't align w/ the boost factor.  What  was your 
request?  I think there might be something else wrong here.

 Do you know any other technique to reset to some fixed constant value, all
 the query-result's scores?
 Each query result should obtain the same score.
 Any suggestion?


The ConstantScoreQuery or a Filter should do this.  You could do something like:

q=*:*fq=the real query, as in q=*:*fq=field:foo

-Grant


--
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8



Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

2010-09-07 Thread MitchK

What if we do not care about the version of a document at index-time?

When it comes to distributed search, we currently decide aggregating
documents based on their uniqueKey. But what would be, if we decide
additionally decide on uniqueKey plus indexingDate, so that we only
aggregate the last indexed version of a document?

The concept could look like this:
When Solr aggregated the documents for a response, it could store what shard
responsed an older version of document x. 

Now a crawler can crawl through our SolrCloud and asking each shard whether
it noticed something like shard y got an older version of doc x-case.
The crawler aggregates those information. After he finished crawling, he
sends delete-by-query-requests to those shards which have older versions of
documents than they should have. 

I will call these stores document versions that are older than the newest
version ODV (Old Document Versions) for better understanding. 

So, what can happen:
Before the crawler can visit shard A - who noticed that shard y stores an
ODV of doc x - shard A can go down. That's okay, because either another
shard noticed the same, or shard A will be available later on. If those
information will we stored at HD, it will also be available. If it was
stored in RAM the information is lost... however, you could replicate those
information over more than one shard, right? :-)

Another case:
Shard y can go down - so someone has to care for storing the noticed
ODV-information, so that one can delete the document when Shard Y comes
back.

Pros:
- You can do something like consistent hashing in connection with a concept
where each node has to care for its neighbour-nodes. This is because only
the neighbour nodes can store ODVs.

- using the described concept, you can do nightly batches, looking for ODVs
in the neigbour-nodes.

- ODVs will be found at requesting time, so we can avoid to response ODVs
over newer versions.

Cons:
- We are wasting disc space.

- This works only for smaller clusters, not for large ones where the number
of machines changes very frequently

... this is just another idea - and it is very very lazy.

I must emphasize, that I assume that neighbour-machines do not go down very
frequently. Of course, it is not a question whether a machine crashes, but
when it crashes - but I assume that the same server does not crash every
hour. :-)

Thoughts?

Kind regards


Andrzej Bialecki wrote:
 
 On 2010-09-06 16:41, Yonik Seeley wrote:
 On Mon, Sep 6, 2010 at 10:18 AM, MitchKmitc...@web.de  wrote:
 [...consistent hashing...]
 But it doesn't solve the problem at all, correct me if I am wrong, but:
 If
 you add a new server, let's call him IP3-1, and IP3-1 is nearer to the
 current ressource X, than doc x will be indexed at IP3-1 - even if IP2-1
 holds the older version.
 Am I right?

 Right.  You still need code to handle migration.

 Consistent hashing is a way for everyone to be able to agree on the
 mapping, and for the mapping to change incrementally.  i.e. you add a
 node and it only changes the docid-node mapping of a limited percent
 of the mappings, rather than changing the mappings of potentially
 everything, as a simple MOD would do.
 
 Another strategy to avoid excessive reindexing is to keep splitting the 
 largest shards, and then your mapping becomes a regular MOD plus a list 
 of these additional splits. Really, there's an infinite number of ways 
 you could implement this...
 

 For SolrCloud, I don't think we'll end up using consistent hashing -
 we don't need it (although some of the concepts may still be useful).
 
 I imagine there could be situations where a simple MOD won't do ;) so I 
 think it would be good to hide this strategy behind an 
 interface/abstract class. It costs nothing, and gives you flexibility in 
 how you implement this mapping.
 
 -- 
 Best regards,
 Andrzej Bialecki 
   ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com
 
 
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p1434329.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

2010-09-07 Thread MitchK

I must add something to my last post:

When saying it could be used together with techniques like consistent
hashing, I mean it could be used at indexing time for indexing documents,
since I assumed that the number of shards does not change frequently and
therefore an ODV-case becomes relatively infrequent. Furthermore the
overhead of searching for and removing those ODV-documents is relatively
low. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p1434364.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search Results optimization

2010-09-07 Thread Chris Hostetter

: also my request handler looks like this
: 
: requestHandler name=mb_artists class=solr.SearchHandler
: lst name=defaults
: str name=defTypedismax/str
: str name=qfname ^2.4/str
: str name=tie0.1/str
: /lst
: /requestHandler

that request handler doesn't match up with the output you posted in your 
previous message -- according to it, you were using qt=standard1 (not 
qt=mb_artists).  the output you posted shows you using a query parser that 
searched for each word in the text field, not the name field.  it also 
didn't appear to be the dismax parser at all.

Since there seems to be some confusion about what handler/parser you are 
actually searching, i suggest getting to the bottom of that, it might 
explain a lot about the results you are getting.

: I really need some help on this,
: again, what I want is...if I search for swingline red stapler, In results,
: docs that have all three keywords should come on top, then docs that have
: any 2 keywords and then docs with 1 keyword, i mean in my sorted order.
: thanks

Because of hte disconnects mentioned above, i didn't look to closely at 
hte score explanations you posted (it's hard to make sense of them since 
they search a field named text and you only posted into about the name 
field) but if, as you mentioned, you're already omiting norms and term 
freq / positions for hte name field then for the most part this is the 
sort order you should be getting (if/when you search against the name 
field instead of hte text field)

the biggest thing you'll probably have to watch out for with the dismax 
parser (if/when you use it) is to explicitly set the 'mm' param to 
something like 0 otherwise documents will be excluded if they only match 
a small number of the terms.



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Is there a way to fetch the complete list of data from a particular column in SOLR document?

2010-09-07 Thread bbarani

Hi,

I am trying to get complete list of unique document ID and compare it with
that of back end to make sure that both back end and SOLR documents are in
sync.

Is there a way to fetch the complete list of data from a particular column
in SOLR document?

Once I get the list, I can easily compare it against the DB and delete the
orphan documents.. 

Please let me know if there are any other ideas / suggestions to implement
this.

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-fetch-the-complete-list-of-data-from-a-particular-column-in-SOLR-document-tp1435586p1435586.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: shingles work in analyzer but not real data

2010-09-07 Thread Chris Hostetter

: Hi Robert, thanks for the response.  I've looked into the query parsers a
: bit and I did find that using the raw parser on a matching multi-word
: keyword works correctly.  I need to have shingling though, in order to
: support query phrases.  It seems odd to have the query parser emitting

The FieldQParser should work for this -- unlike the raw QParser it uses 
the Analyzer for the specified field, but has no metacharacters of it's 
own.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



RE: Is there a way to fetch the complete list of data from a particular column in SOLR document?

2010-09-07 Thread Markus Jelsma
q=*:*fl=id_FIELDrows=NUM_DOCS ?
 
-Original message-
From: bbarani bbar...@gmail.com
Sent: Tue 07-09-2010 23:09
To: solr-user@lucene.apache.org; 
Subject: Is there a way to fetch the complete list of data from a particular 
column in SOLR document?


Hi,

I am trying to get complete list of unique document ID and compare it with
that of back end to make sure that both back end and SOLR documents are in
sync.

Is there a way to fetch the complete list of data from a particular column
in SOLR document?

Once I get the list, I can easily compare it against the DB and delete the
orphan documents.. 

Please let me know if there are any other ideas / suggestions to implement
this.

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-fetch-the-complete-list-of-data-from-a-particular-column-in-SOLR-document-tp1435586p1435586.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FieldCache.DEFAULT.getInts vs FieldCache.DEFAULT.getStringIndex. Memory usage

2010-09-07 Thread Chris Hostetter

: I need to load a FieldCache for a field wich is a solr integer type and has
: as maximum 3 digits. Let's say my index has 10M docs.
: I am wandering what is more optimal and less memory consumig, to load a
: FieldCache.DEFAUL.getInts or a FieldCache.DEFAULT.getStringIndex.

by itself, getInts always uses use less memory then getStringIndex.  no 
matter what your data looks like, getStringIndex can never use less memory 
then getInts.

the question hwoever is if any other code is going to use getStringIndex 
on the same field, defeating any memory savings you have -- you said 
integer but you didn't say what FieldType class that was mapped to.  In 
the 1.4 example schema, int is mapped to a TreiField which will use 
getInts() for the field cache.  In Solr 1.3's example schema the integer 
type was mapped to IntField which also uses getInts() for hte field 
cache.

But we have no idea what your schema is using.

If it uses SortableIntField that's when the SOlr code under the covers 
is going to use getStringIndex(), so you might as well use it also.

(you can verify this in Solr 1.4 by looking at the stats for the 
fieldCache - it tells you exactly what is in use at any moment)

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Is there a way to fetch the complete list of data from a particular column in SOLR document?

2010-09-07 Thread Geert-Jan Brits
Please let me know if there are any other ideas / suggestions to implement
this.

You're indexing program should really take care of this IMHO. Each time your
indexer inserts a document to Solr, flag the corresponding entity in your
RDBMS, each time you delete, remove the flag. You should implement this as a
transaction to make sure all is still fine in the unlikely event of a crash
midway.

2010/9/7 bbarani bbar...@gmail.com


 Hi,

 I am trying to get complete list of unique document ID and compare it with
 that of back end to make sure that both back end and SOLR documents are in
 sync.

 Is there a way to fetch the complete list of data from a particular column
 in SOLR document?

 Once I get the list, I can easily compare it against the DB and delete the
 orphan documents..

 Please let me know if there are any other ideas / suggestions to implement
 this.

 Thanks,
 Barani
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Is-there-a-way-to-fetch-the-complete-list-of-data-from-a-particular-column-in-SOLR-document-tp1435586p1435586.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Download document from solr

2010-09-07 Thread Chris Hostetter

: Subject: Download document from solr
: References: aanlkti=ajq4qpifn2r0dyz=s9hv1i=pc-nqnxp3hw...@mail.gmail.com
: In-Reply-To: aanlkti=ajq4qpifn2r0dyz=s9hv1i=pc-nqnxp3hw...@mail.gmail.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking





-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: MoreLikethis and fq not giving exact results ?

2010-09-07 Thread Chris Hostetter

:  But when I enable mlt inside the query it returns the results for jp_ as
: well, because job_title also exist in job posting ( though jp_ or cp_
: already differentiating to both of this ?)

I don't believe the MLT Component has anyway of filtering like this.  In 
your case you want the fq params to apply to the MLT results as well as 
the main results, but in other cases people wantthe fq to apply to the 
main result set and let the MLT be per individual doc with no ohter 
filters -- no one has implemented a configurable way to say when/if 
certain fqs should apply in the way you describe.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Deploying Solr 1.4.1 in JbossAs 6

2010-09-07 Thread Chris Hostetter

: 1-extract the solr.war
: 2-edit the web.xml for setting solr/home param
: 3-create the solr.war
: 4-setup solr home directory
: 5-copy the solr.war to JBossAs 6 deploy directory
: 7-start the jboss server

I don't know a lot about JBoss, but from what i understand there really 
shouldn't be any need to customize the solr.war.

You should be able to use JNDI to set the solr home dir, just like with 
tomcat...
http://docs.jboss.org/jbossweb/latest/jndi-resources-howto.html


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Solr, c/s type ?

2010-09-07 Thread Chris Hostetter

: Subject: Solr, c/s type ?
: 
: i'm wondering c/s type is possible (not http web type).
: if possible, could i get the material about it?

You're going t oneed to provide more info exaplining what it is you are 
asking baout -- i don't know about anyone else, but i honestly have 
absolutely no idea what you might possibly mean by c/s type is possible 
(not http web type)

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



RE: Re: MoreLikethis and fq not giving exact results ?

2010-09-07 Thread Markus Jelsma
I can think of two useful cases for a feature that limits MLT results depending 
with an optional mlt.fq parameter that limits the MLT results for each 
document, based on that fq:

 

1. prevent irrelevant docs when in a deep faceted navigation

2. general search results with MLT where you need to distinguish between 
collections when there are many different collections sharing the same index

 


 
-Original message-
From: Chris Hostetter hossman_luc...@fucit.org
Sent: Tue 07-09-2010 23:32
To: solr-user@lucene.apache.org; 
Subject: Re: MoreLikethis and fq not giving exact results ?

I don't believe the MLT Component has anyway of filtering like this.  In 
your case you want the fq params to apply to the MLT results as well as 
the main results, but in other cases people wantthe fq to apply to the 
main result set and let the MLT be per individual doc with no ohter 
filters -- no one has implemented a configurable way to say when/if 
certain fqs should apply in the way you describe.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss      ...  Stump The Chump!


 


Re: Is semicolon a character that needs escaping?

2010-09-07 Thread Chris Hostetter


: Subject: Is semicolon a character that needs escaping?
...
: From this I conclude that there is a bug either in the docs or in the
: query parser or I missed something. What is wrong here?

Back in Solr 1.1, the standard query parser treated ; as a special 
character and looked for sort instructions after it.  

Starting in Solr 1.2 (released in 2007) a sort param was added, and 
semicolon was only considered a special character if you did not 
explicilty mention a sort param (for back compatibility)

Starting with Solr 1.4, the default was changed so that semicolon wasn't 
considered a meta-character even if you didn't have a sort param -- you 
have to explicilty select the lucenePlusSort QParser to get this 
behavior.

I can only assume that if you are seeing this behavior, you are either 
using a very old version of Solr, or you have explicitly selected the 
lucenePlusSort parser somewhere in your params/config.

This was heavily documented in CHANGES.txt for Solr 1.4 (you can find 
mention of it when searching for either ; or semicolon)



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



RE: Re: MoreLikethis and fq not giving exact results ?

2010-09-07 Thread Chris Hostetter

: I can think of two useful cases for a feature that limits MLT results 
: depending with an optional mlt.fq parameter that limits the MLT results 
: for each document, based on that fq:

i don't disagree with you -- i was just commenting that it doesn't work 
that way at the moment, because it was designed with differnet use cases 
in mind (returning docs related to the result docs, independent of how you 
found those result docs)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



RE: Re: MoreLikethis and fq not giving exact results ?

2010-09-07 Thread Markus Jelsma
I know =)

 

I was just polling votes for a feature request - there is no such issue filed 
for this component. Perhaps there should be?
 
-Original message-
From: Chris Hostetter hossman_luc...@fucit.org
Sent: Wed 08-09-2010 00:13
To: solr-user@lucene.apache.org; 
Subject: RE: Re: MoreLikethis and fq not giving exact results ?

i don't disagree with you -- i was just commenting that it doesn't work 
that way at the moment, because it was designed with differnet use cases 
in mind (returning docs related to the result docs, independent of how you 
found those result docs)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss      ...  Stump The Chump!



Re: stream.url

2010-09-07 Thread Chris Hostetter

:I used escape charaters and made it... It is not problem for
: a single file of 'solr apache' but it shows the same problem for the files
: like Wireless lan.ppt, Tom info.pdf.

Since you haven't told us what the original URL is that you are trying to 
pass as a value for the stream.url value, it's impossible for us to guess 
wehter your URL escaping is working properly.

bear in mind that you need to escape url metacharacters *twice* for this 
type of thing -- once to encode the URL in a way that the final server 
will recognize it, and once again to pass it as a value in a URL to Solr.

since you explicitly mention having problems with white space, but i don't 
see any %25 or %2B sequences in your URL i'm going to guess that the 
porblem is you are not double escaping the white space properly -- the 
first time you escape it it should either be + or %20 which means the 
second time it should either be %2B or %2520


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Help with partial term highlighting

2010-09-07 Thread Jed Glazner

Hello Everyone,

Thanks for taking time to read through this.  I'm using a checkout from
the solr 3.x branch

My problem is with the highlighter and wildcards, and is exactly the
same as this guy's but I can't find a reply to his problem:

http://search-lucene.com/m/EARFMs6eR4/partial+highlight+wildcardsubj=Re+old+wildcard+highlighting+behaviour

I can get the highlighter to work with wild cards just fine, the problem
is that  solr is returning the term matched, when what I want it to do
is highlight the chars in the term that were matched.

Example:

http://192.168.1.75:8983/solr/music/select?indent=onq=name_title:wel*qt=beyondhl=truehl.fl=name_titlef.name_title.hl.usePhraseHighlighter=truef.name_title.hl.highlightMultiTerm=true

The results that come back look like this:

emWelcome/em  to the Jungle

What I want them to look like is this:
emWel/emcome to the Jungle

 From what I gathered by searching the archives is that solr 1.1 used to
do this... Is there anyway to get what I want without customizing the
highlighting feature?

Thanks!



Null Pointer Exception with shardsfacets where some shards have no values for some facets.

2010-09-07 Thread Ron Mayer
Short summary:
  * Mixing Facets and Shards give me a NullPointerException
when not all docs have all facets.
  * Attached patch improves the failure mode, but still
spews errors in the log file
  * Suggestions how to fix that would be appreciated.


In my system, I tried separating out a couple similar but different
types of documents into a couple different shards.

Both shards have the identical schema; with the facets defined as a 
dynamicfield:
  dynamicField name=*_facettype=string indexed=true  
stored=false multiValued=true  /
Some facets only have documents with a value for them in the first shard,
Other facets only have documents with a value for them in the second shard.

When I try to do a query that asks for a facet.field that's only
has values in the first shard, and for a different facet.field
that only has values in the second shard, I'm getting this
exception:

Sep 7, 2010 4:55:38 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:340)
at 
org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:301)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

I don't have a real simple test case yet; but could work on one if
it'd make it easier to track down.Also, I could post the schema
and solrconfig if that'd help.





The attached patch seems to mostly work for me; in that it's returning
valid search results and at least some facet information, but with
that patch I'm then getting this exception showing up:

Sep 7, 2010 5:28:30 PM org.apache.solr.common.SolrException log
SEVERE: Exception during facet 
counts:org.apache.lucene.queryParser.ParseException: Expected identifier at pos 
20 str='{!terms=$involvement/race_facet__terms}involvement/race_facet'
at 
org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:718)
at 
org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:165)
at 
org.apache.solr.search.QueryParsing.getLocalParams(QueryParsing.java:221)
at 
org.apache.solr.request.SimpleFacets.parseParams(SimpleFacets.java:102)
at 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:327)
at 
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:188)
at 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
   

Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.

2010-09-07 Thread Yonik Seeley
Thanks for the report Ron, can you open a JIRA issue?
What version of Solr is this?

-Yonik
http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8


On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer r...@0ape.com wrote:
 Short summary:
  * Mixing Facets and Shards give me a NullPointerException
    when not all docs have all facets.
  * Attached patch improves the failure mode, but still
    spews errors in the log file
  * Suggestions how to fix that would be appreciated.


 In my system, I tried separating out a couple similar but different
 types of documents into a couple different shards.

 Both shards have the identical schema; with the facets defined as a 
 dynamicfield:
  dynamicField name=*_facet        type=string     indexed=true  
 stored=false multiValued=true  /
 Some facets only have documents with a value for them in the first shard,
 Other facets only have documents with a value for them in the second shard.

 When I try to do a query that asks for a facet.field that's only
 has values in the first shard, and for a different facet.field
 that only has values in the second shard, I'm getting this
 exception:

 Sep 7, 2010 4:55:38 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NullPointerException
        at 
 org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:340)
        at 
 org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:301)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
        at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
        at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
        at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
        at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
        at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

 I don't have a real simple test case yet; but could work on one if
 it'd make it easier to track down.    Also, I could post the schema
 and solrconfig if that'd help.





 The attached patch seems to mostly work for me; in that it's returning
 valid search results and at least some facet information, but with
 that patch I'm then getting this exception showing up:

 Sep 7, 2010 5:28:30 PM org.apache.solr.common.SolrException log
 SEVERE: Exception during facet 
 counts:org.apache.lucene.queryParser.ParseException: Expected identifier at 
 pos 20 str='{!terms=$involvement/race_facet__terms}involvement/race_facet'
        at 
 org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:718)
        at 
 org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:165)
        at 
 org.apache.solr.search.QueryParsing.getLocalParams(QueryParsing.java:221)
        at 
 org.apache.solr.request.SimpleFacets.parseParams(SimpleFacets.java:102)
        at 
 org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:327)
        at 
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:188)
        at 
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
       

Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.

2010-09-07 Thread Ron Mayer
Yonik Seeley wrote:
 Thanks for the report Ron, can you open a JIRA issue?

Sure.  I'll do it at work tomorrow morning, hopefully
after I try to verify with a standalone test case.

 What version of Solr is this?

This is trunk as of a few days ago.   I can update to
the latest trunk and check there too.


 -Yonik
 http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
 
 
 On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer r...@0ape.com wrote:
 Short summary:
  * Mixing Facets and Shards give me a NullPointerException
when not all docs have all facets.
  * Attached patch improves the failure mode, but still
spews errors in the log file
  * Suggestions how to fix that would be appreciated.


 In my system, I tried separating out a couple similar but different
 types of documents into a couple different shards.

 Both shards have the identical schema; with the facets defined as a 
 dynamicfield:
  dynamicField name=*_facettype=string indexed=true  
 stored=false multiValued=true  /
 Some facets only have documents with a value for them in the first shard,
 Other facets only have documents with a value for them in the second shard.

 When I try to do a query that asks for a facet.field that's only
 has values in the first shard, and for a different facet.field
 that only has values in the second shard, I'm getting this
 exception:

 Sep 7, 2010 4:55:38 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NullPointerException
at 
 org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:340)
at 
 org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232)
at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:301)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

 I don't have a real simple test case yet; but could work on one if
 it'd make it easier to track down.Also, I could post the schema
 and solrconfig if that'd help.





 The attached patch seems to mostly work for me; in that it's returning
 valid search results and at least some facet information, but with
 that patch I'm then getting this exception showing up:

 Sep 7, 2010 5:28:30 PM org.apache.solr.common.SolrException log
 SEVERE: Exception during facet 
 counts:org.apache.lucene.queryParser.ParseException: Expected identifier at 
 pos 20 str='{!terms=$involvement/race_facet__terms}involvement/race_facet'
at 
 org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:718)
at 
 org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:165)
at 
 org.apache.solr.search.QueryParsing.getLocalParams(QueryParsing.java:221)
at 
 org.apache.solr.request.SimpleFacets.parseParams(SimpleFacets.java:102)
at 
 org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:327)
at 
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:188)
at 
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
at 
 

How to use TermsComponent when I need a filter

2010-09-07 Thread David Yang
Hi,

 

I have a solr index, which for simplicity is just a list of names, and a
list of associations. (either a multivalue field e.g. {A1, A2, A3, A6}
or a string concatenation list e.g. A1 A2 A3 A6)

I want to be able to provide autocomplete but with a specific
association. E.g. Names beginning with Bob in association A5. 

Is this possible? I would prefer not to have to have one index per
association, since the number of associations is pretty large

 

Cheers,

David 



Batch update, order of evaluation

2010-09-07 Thread Greg Pendlebury
Does anyone know with certainty how (or even if) order is evaluated when
updates are performed by batch?

Our application internally buffers solr documents for speed of ingest before
sending them to the server in chunks. The XML documents sent to the solr
server contain all documents in the order they arrived without any settings
changed from the defaults (so overwrite = true). We are careful to avoid
things like HashMaps on our side since they'd lose the order, but I can't be
certain what occurs inside Solr.

Sometimes if an object has been indexed twice for various reasons it could
appear twice in the buffer but the most up-to-date version is always last. I
have however observed instances where the first copy of the document is
indexed and differences in the second copy are missing. Does this sound
likely? And if so are there any obvious settings I can play with to get the
behavior I desire?

I looked at:
http://wiki.apache.org/solr/UpdateXmlMessages

but there is no mention of order, just the overwrite flag (which I'm unsure
how it is applied internally to an update message) and the deprecated
duplicates flag (which I have no idea about).

Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as
per http://wiki.apache.org/solr/Solrj. This is no mention of order there
either however.

Thanks to anyone who took the time to read this.

Ta,
Greg


list of filters/factories/Input handlers/blah blah

2010-09-07 Thread Dennis Gearon
Is there a definitive list of:

   filters
inputHandlers

and other 'code fragments' that do I/O processing for Solr/Lucene?


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


Re: Advice requested. How to map 1:M or M:M relationships with support for facets

2010-09-07 Thread Lance Norskog
These days the best practice for a 'drill-down' facet in a UI is to 
encode both the unique value of the facet and the displayable string 
into one facet value. In the UI, you unpack and show the display string, 
and search with the full facet string.


If you want to also do date ranges, make a separate matching 'date' 
field. This will store the date twice. Solr schema design is all about 
denormalizing.


Tim Gilbert wrote:


Hi guys,

*Question:*

What is the best way to create a solr schema which supports a 
‘multivalue’ where the value is a two item array of event category and 
a date. I want to have faceted searches, counts and Date Range ability 
on both the category and the dates.


*Details:*

This is a person database where Person can have details about them 
(like address) and Person have many “Events”. Events have a category 
(type of event) and a Date for when that event occurred. At the bottom 
you will see a simple diagram showing the relationship. Briefly, a 
Person has many Events and Events have a single category and a single 
person.


What I would like to be able to do is:

Have a facet which shows all of the event categories, with a 
‘sub-facet’ that show Category + date. For example, if a Category was 
“Attended Conference” and date was 2008-09-08, I’d be able to show a 
count of all “Attended Conference”, then have a tree type control and 
show the years (for example):


Eg.

+ Attended Conference (1038)

|

+ 2010 (100)

+--- 2009 (134)

+--- 2008 (234)

|

+ Another Event Category (23432)

|

+-2010 (234)

+2009 (245)

Etc.

For scale, I expect to have  100 “Event Categories” and  a million 
person_event records on  250,000 persons. I don’t care very much 
about disk space, so if it’s a 1 GB or 100 GB due to indexing, that’s 
okay if the solution works (and its fast! J)


*Solutions I looked at:*

* I looked at poly but they seem to be a fixed length and appeared
  to be the same type. Typical use case was latitude  longitude.
  I don’t think this will work because there are a variable number
  of events attached to a person.
* I looked at multiValued but it didn’t seem to permit two fields
  having a relationship, ie. Event Category  Event Date. It
  seemed to me that they need to be broken out. That’s not
  necessarily a bad thing, but it didn’t seem ideal.
* I thought about concatenating category  date to create a fake
  fields strictly for faceting purposes, but I believe that will
  break date ranges. Eg. EventCategoryId + “|” + Date = 1|2009 as
  a facet would allow me to show counts for that event type. Seems
  a bit unwieldy to me…

What’s the groups advice for handling this situation in the best way?

Thanks in advance, as always sorry if this question has been asked and 
answered a few times already. I googled for a few hours before writing 
this… but things change so fast with Solr that any article older than 
a year was suspect to me, also there are so many patches that provide 
additional functionality…


Tim

Schema:



Re: Deploying Solr 1.4.1 in JbossAs 6

2010-09-07 Thread Lance Norskog
Does JBoss still uses Tomcat? Tomcat has an external file to configure 
war files in Catalina/localhost.  If JBoss is not Tomcat any more, it 
must have a directory and file format somewhere for an external 
configuration of a servlet war.


Lance

Chris Hostetter wrote:

: 1-extract the solr.war
: 2-edit the web.xml for setting solr/home param
: 3-create the solr.war
: 4-setup solr home directory
: 5-copy the solr.war to JBossAs 6 deploy directory
: 7-start the jboss server

I don't know a lot about JBoss, but from what i understand there really
shouldn't be any need to customize the solr.war.

You should be able to use JNDI to set the solr home dir, just like with
tomcat...
http://docs.jboss.org/jbossweb/latest/jndi-resources-howto.html


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!

   


RE: list of filters/factories/Input handlers/blah blah

2010-09-07 Thread Jonathan Rochkind
Not neccesarily definitive, but filters and tokenizers can be found here: 

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Not sure if that's all of the analyzers (which I think is the generic name for 
both tokenizers and filters) that come with Solr, but I believe it's at least 
most of them. It's of course possible to write your own analyzers or use third 
party analyzers too, if there's a list of such available, I don't know about 
it, but it sure would be handy. 

Some Query parsers, which I _think_ is the right term for things you can pass 
as defType=something or {!type=something}, or one or two other things with 
different key names I forget, can be found here:

http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers

Along with lucene and dismax also mentioned on that page, I _think_ that's 
the complete list of query parsers included with Solr 1.4, but someone PLEASE 
correct me if I'm wrong. It is indeed difficult to get a handle on this stuff 
for me too. 

Other than query parsers and analyzers, I'm not entirely certain what else 
falls in the category of I/O components.  I don't know anything about input 
handlers, myself. 

Jonathan

From: Dennis Gearon [gear...@sbcglobal.net]
Sent: Tuesday, September 07, 2010 10:41 PM
To: solr-user@lucene.apache.org
Subject: list of filters/factories/Input handlers/blah blah

Is there a definitive list of:

   filters
inputHandlers

and other 'code fragments' that do I/O processing for Solr/Lucene?


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php