Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-28 Thread ahmed baseet
As far as I know, Maven is a build/mgmt tool for java projects quite similar
to Ant, right? No I'm not using this , then I think I don't need to worry
about those pom files.
But  I'm still not able to figure out the error with classpath/jar files I
mentioned in my previous mails. Shall I try getting those jar files,
specifically that solr-solrj jar that contains commons-http-solr-server
class files? If yes then can you tell me where to get those jar files from,
on the web?  Has anyone ever faced similar problems? Please help me fixing
these silly issues?

Thanks,
Ahmed.
On Mon, Apr 27, 2009 at 6:59 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet ahmed.bas...@gmail.com
 wrote:

  Can anyone help me selecting the proper pom.xml file out of the bunch of
  *-pom.xml.templates available.
 

 Ahmed, are you using Maven? If not, then you do not need these pom files.
 If
 you are using Maven, then you need to add a dependency to solrj.


 http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e

 --
 Regards,
 Shalin Shekhar Mangar.



Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
the Solr distro contains all the jar files. you can take either the
latest release (1.3) or a nightly

On Tue, Apr 28, 2009 at 11:34 AM, ahmed baseet ahmed.bas...@gmail.com wrote:
 As far as I know, Maven is a build/mgmt tool for java projects quite similar
 to Ant, right? No I'm not using this , then I think I don't need to worry
 about those pom files.
 But  I'm still not able to figure out the error with classpath/jar files I
 mentioned in my previous mails. Shall I try getting those jar files,
 specifically that solr-solrj jar that contains commons-http-solr-server
 class files? If yes then can you tell me where to get those jar files from,
 on the web?  Has anyone ever faced similar problems? Please help me fixing
 these silly issues?

 Thanks,
 Ahmed.
 On Mon, Apr 27, 2009 at 6:59 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet ahmed.bas...@gmail.com
 wrote:

  Can anyone help me selecting the proper pom.xml file out of the bunch of
  *-pom.xml.templates available.
 

 Ahmed, are you using Maven? If not, then you do not need these pom files.
 If
 you are using Maven, then you need to add a dependency to solrj.


 http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e

 --
 Regards,
 Shalin Shekhar Mangar.





-- 
--Noble Paul


Re: half width katakana

2009-04-28 Thread Koji Sekiguchi
If you use CharFilter, you should use CharStream aware Tokenizer to 
correct terms offsets.

There are two CharStreamAware*Tokenizer in trunk/Solr 1.4.
Probably you want to use CharStreamAwareCJKTokenizer(Factory).

Koji


Ashish P wrote:

After this should I be using same cjkAnalyzer or use charFilter??
Thanks,
Ashish


Koji Sekiguchi-2 wrote:
  

Ashish P wrote:


I want to convert half width katakana to full width katakana. I tried
using
cjk analyzer but not working.
Does cjkAnalyzer do it or is there any other way??
  
  

CharFilter which comes with trunk/Solr 1.4 just covers this type of
problem.
If you are using Solr 1.3, try the patch attached below:

https://issues.apache.org/jira/browse/SOLR-822

Koji







  




Re: highlighting html content

2009-04-28 Thread Christian Vogler
Hi Matt,

On Tue, Apr 28, 2009 at 4:24 AM, Matt Mitchell goodie...@gmail.com wrote:
 I've been toying with setting custom pre/post delimiters and then removing
 them in the client, but I thought I'd ask the list before I go to far with
 that idea :)

this is what I do. I define the custom highlight delimiters as
[solr:hl] and [/solr:hl], and then do a string replace with em
class=highlight /em on the search results.

It is simple to implement, and effective.

Best regards
- Christian


Getting incorrect value while trying to extract content from xlsx

2009-04-28 Thread Koushik Mitra
HI,

I was trying to extract content from an xlsx file for indexing.
However, I am getting julian date value for a cell with date format and '1.0' 
in place of '100%'.
I want to retain the value as present in that xlsx file.

Solution appreciated.

Thanks,
Koushik

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: half width katakana

2009-04-28 Thread Ashish P

Koji san,

Using CharStreamAwareCJKTokenizerFactory is giving me following error,
SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to
org.apache.solr.analysis.CharStream

May be you are typecasting Reader to subclass.
Thanks,
Ashish


Koji Sekiguchi-2 wrote:
 
 If you use CharFilter, you should use CharStream aware Tokenizer to 
 correct terms offsets.
 There are two CharStreamAware*Tokenizer in trunk/Solr 1.4.
 Probably you want to use CharStreamAwareCJKTokenizer(Factory).
 
 Koji
 
 
 Ashish P wrote:
 After this should I be using same cjkAnalyzer or use charFilter??
 Thanks,
 Ashish


 Koji Sekiguchi-2 wrote:
   
 Ashish P wrote:
 
 I want to convert half width katakana to full width katakana. I tried
 using
 cjk analyzer but not working.
 Does cjkAnalyzer do it or is there any other way??
   
   
 CharFilter which comes with trunk/Solr 1.4 just covers this type of
 problem.
 If you are using Solr 1.3, try the patch attached below:

 https://issues.apache.org/jira/browse/SOLR-822

 Koji




 

   
 
 
 

-- 
View this message in context: 
http://www.nabble.com/half-width-katakana-tp23270186p23272475.html
Sent from the Solr - User mailing list archive at Nabble.com.



Multiple Facet Dates

2009-04-28 Thread Marc Sturlese

Hey there,
I needed to have a multiple date facet functionality. Like say for example
to show the latests results in the last day, last week and last month. I
wanted to do it with just one query. 
The date facet part of solrconfig.xml would look like:

  str name=facet.datedate_field/str
  str name=facet.date.startNOW/DAY-1DAY/str
  str name=facet.date.startNOW/DAY-7DAY/str
  str name=facet.date.startNOW/DAY-30DAY/str

  str name=facet.date.endNOW/DAY+1DAY/str
  str name=facet.date.endNOW/DAY+1DAY/str
  str name=facet.date.endNOW/DAY+1DAY/str

  str name=facet.date.gap+2DAY/str
  str name=facet.date.gap+8DAY/str
  str name=facet.date.gap+31DAY/str

What I have done to have it working is do some changes at
getFacetdateCounts() in SimpleFacets.java

Instead of getting start,end and gap params as String I get them as array of
strings. 
I would have 3 array. In the first position of each would have the first
start, the first ends and the firs gap. Same for the second and thirth( in
my example )
Once I have them I do exactly what the function did before but for every
position of the array.

The resultant output looks like this:

lst name=facet_dates
  lst name=date_field
int name=2009-04-27T00:00:00Z21/int
str name=gap+2DAY/str
date name=end2009-04-29T00:00:00Z/date
int name=2009-04-21T00:00:00Z86/int
str name=gap+8DAY/str
date name=end2009-04-29T00:00:00Z/date
int name=2009-03-29T00:00:00Z316/int
str name=gap+31DAY/str
date name=end2009-04-29T00:00:00Z/date
  /lst
/lst

I am doing it just for testing. This works for me but maybe would be
confusing to parse the output for other examples (let's say when you need to
repeat the gap to cover all the range).

Does someone think that could be good to have this functionality? In case
yes I could post what I have and do it in the right way it in case someone
points me in the right direction.

Thanks in advance

-- 
View this message in context: 
http://www.nabble.com/Multiple-Facet-Dates-tp23272868p23272868.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: half width katakana

2009-04-28 Thread Koji Sekiguchi
The exception is expected if you use CharStream aware Tokenizer without 
CharFilters.

Please see example/solr/conf/schema.xml for the setting of CharFilter and
CharStreamAware*Tokenizer:

   !-- charFilter + CharStream aware WhitespaceTokenizer  --
   
!--   
   fieldType name=textCharNorm class=solr.TextField 
positionIncrementGap=100 

 
analyzer   
   charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/  

   tokenizer 
class=solr.CharStreamAwareWhitespaceTokenizerFactory/
 
/analyzer  
   
/fieldType   
   --


Thank you,

Koji


Ashish P wrote:

Koji san,

Using CharStreamAwareCJKTokenizerFactory is giving me following error,
SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to
org.apache.solr.analysis.CharStream

May be you are typecasting Reader to subclass.
Thanks,
Ashish


Koji Sekiguchi-2 wrote:
  
If you use CharFilter, you should use CharStream aware Tokenizer to 
correct terms offsets.

There are two CharStreamAware*Tokenizer in trunk/Solr 1.4.
Probably you want to use CharStreamAwareCJKTokenizer(Factory).

Koji


Ashish P wrote:


After this should I be using same cjkAnalyzer or use charFilter??
Thanks,
Ashish


Koji Sekiguchi-2 wrote:
  
  

Ashish P wrote:



I want to convert half width katakana to full width katakana. I tried
using
cjk analyzer but not working.
Does cjkAnalyzer do it or is there any other way??
  
  
  

CharFilter which comes with trunk/Solr 1.4 just covers this type of
problem.
If you are using Solr 1.3, try the patch attached below:

https://issues.apache.org/jira/browse/SOLR-822

Koji






  
  





  




RE: OutofMemory on Highlightling

2009-04-28 Thread Gargate, Siddharth
Is it possible to read only maxAnalyzedChars from the stored field
instead of reading the complete field in the memory? For instance, in my
case, is it possible to read only first 50K characters instead of
complete 1 MB stored text? That will help minimizing the memory usage
(Though, it will still take 50K * 500 * 2 = 50 MB for 500 results). 

I would really appreciate some feedback on this issue...

Thanks,
Siddharth


-Original Message-
From: Gargate, Siddharth [mailto:sgarg...@ptc.com] 
Sent: Friday, April 24, 2009 10:46 AM
To: solr-user@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

I am not sure whether lazy loading should help solve this problem. I
have set enableLazyFieldLoading to true but it is not helping.

I went through the code and observed that
DefaultSolrHighlighter.doHighlighting is reading all the documents and
the fields for highlighting (In my case, 1 MB stored field is read for
all documents). 

Also I am confused over the following code in SolrIndexSearcher.doc()
method

if(!enableLazyFieldLoading || fields == null) {
  d = searcher.getIndexReader().document(i);
} else {
  d = searcher.getIndexReader().document(i, 
 new SetNonLazyFieldSelector(fields));
}

Are we setting the fields as NonLazy even if lazy loading is enabled?

Thanks,
Siddharth

-Original Message-
From: Gargate, Siddharth [mailto:sgarg...@ptc.com] 
Sent: Wednesday, April 22, 2009 11:12 AM
To: solr-user@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

Here is the stack trace

SEVERE: java.lang.OutOfMemoryError: Java heap space
at
java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133)
at java.lang.StringCoding.decode(StringCoding.java:173)
at java.lang.String.init(String.java:444)
at
org.apache.lucene.store.IndexInput.readString(IndexInput.java:125)
at
org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:390)
at
org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:230)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:892)
at
org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.j
ava:277)
at
org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:176
)
at
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:457)
at
org.apache.solr.search.SolrIndexSearcher.readDocs(SolrIndexSearcher.java
:482)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultS
olrHighlighter.java:253)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightCo
mponent.java:84)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
Handler.java:195)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2
86)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84
5)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)



-Original Message-
From: Gargate, Siddharth [mailto:sgarg...@ptc.com] 
Sent: Wednesday, April 22, 2009 9:29 AM
To: solr-user@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

I tried disabling the documentCache but still the same issue. 

documentCache
  class=solr.LRUCache
  size=0
  initialSize=0
  autowarmCount=0/



-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Monday, April 20, 2009 4:38 PM
To: solr-user@lucene.apache.org
Subject: Re: OutofMemory on Highlightling

Gargate, Siddharth wrote:
 Anybody facing the same issue? Following is my configuration
 ...
 field name=content type=text indexed=true stored=false
 multiValued=true/
 field name=teaser type=text indexed=false stored=true/

Re: Getting incorrect value while trying to extract content from xlsx

2009-04-28 Thread Otis Gospodnetic

Koushik,

You didn't say much about how you are doing the extraction.  Note that Solr 
doesn't do any extraction from spreadsheets, even though it has a component 
(known as Solr Cell) to provide that interface.  The actual extraction is done 
by a tool called Tika, or more precisely, POI, both of which are separate 
Apache projects.  Asking there may get you to the solution faster.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Koushik Mitra koushik_mi...@infosys.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Tuesday, April 28, 2009 4:17:00 AM
 Subject: Getting incorrect value while trying to extract content from xlsx 
 
 HI,
 
 I was trying to extract content from an xlsx file for indexing.
 However, I am getting julian date value for a cell with date format and '1.0' 
 in 
 place of '100%'.
 I want to retain the value as present in that xlsx file.
 
 Solution appreciated.
 
 Thanks,
 Koushik
 
  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
 for the use of the addressee(s). If you are not the intended recipient, 
 please 
 notify the sender by e-mail and delete the original message. Further, you are 
 not 
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person 
 and 
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken 
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage 
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your 
 own virus checks before opening the e-mail or attachment. Infosys reserves 
 the 
 right to monitor and review the content of all messages sent to or from this 
 e-mail 
 address. Messages sent to or from this e-mail address may be stored on the 
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***



Re: Solr Performance bottleneck

2009-04-28 Thread Andrey Klochkov
On Mon, Apr 27, 2009 at 10:27 PM, Jon Bodner jbod...@blackboard.com wrote:


 Trying to point multiple Solrs  on multiple boxes at a single shared
 directory is almost certainly doomed to failure; the read-only Solrs won't
 know when the read/write Solr instance has updated the index.


I'm solving the same problem while working with index stored in data-grid
and I've just created a data-grid listener which looks for segments.gen
file changes and forces Solr to refresh its structures after receiving this
event. You can do the same job with file system index - write some code
which looks at segments.gen file changes and kicks solr when a change is
detected.

It would be great to add such a mechanism to Solr, I mean some abstracted
(via an interface) way to implement index refresh events sources.

Also there's code in SolrCore which checks index existence by looking into
file system and it would be better to abstract that code too. WDYT? I can
provide patches.

-- 
Andrew Klochkov


Re: how to reset the index in solr

2009-04-28 Thread Erik Hatcher


On Apr 24, 2009, at 1:54 AM, sagi4 wrote:

Can i get the rake task for clearing the index of solr, I mean rake
index::rebuild, It would be very helpful and also to avoid the  
delete id by

manually.


How do you currently build your index?

But making a Rake task to do perform Solr operations is generally  
pretty trivial.  In Ruby (after gem install solr-ruby):


   require 'solr'
   solr = Solr::Connection.new(http://localhost:8983/solr;)
   solr.optimize  # for example

Erik



Re: Term highlighting with MoreLikeThisHandler?

2009-04-28 Thread Eric Sabourin
Yes... at least I think so.  the highlighting works correctly for me on
another request handler... see below the request handler for my
morelikethishandler query.
Thanks for your help... Eric


  requestHandler name=/mlt class=solr.MoreLikeThisHandler
lst name=defaults

 str name=fl
 score,id,timestamp,type,textualId,subject,url,server
/str

 str name=echoParamsexplicit/str
 str name=mlt.match.includetrue/str
 str name=mlt.interestingTermslist/str
   str
name=mlt.flsubject,requirements,productName,justification,operation_exact/str
   int name=mlt.minwl2/int
   int name=mlt.mintf1/int
   int name=mlt.mindf2/int

str name=hltrue/str
str name=hl.snippets1/str
!-- for subject and textualID fields, we want no fragmenting, just
highlighting --
str name=f.textualId.hl.fragsize0/str
str name=f.subject.hl.fragsize0/str
str name=f.requirements.hl.fragmenterregex/str !-- defined below
--
str name=f.justification.hl.fragmenterregex/str
/lst
  /requestHandler


On Mon, Apr 27, 2009 at 11:30 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Eric,

 Have you tried using MLT with parameters described on
 http://wiki.apache.org/solr/HighlightingParameters ?


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Eric Sabourin eric.sabourin2...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Monday, April 27, 2009 10:31:38 AM
  Subject: Term highlighting with MoreLikeThisHandler?
 
  I submit a query to the MoreLikeThisHandler to find documents similar to
 a
  specified document.  This works and I've configured my request handler to
  also return the interesting terms.
 
  Is it possible to have MLT return to me highlight snippets in the similar
  documents it returns? I mean generate hl snippets of the interesting
 terms?
  If so how?
 
  Thanks... Eric




-- 
Eric
Sent from Halifax, NS, Canada


Re: highlighting html content

2009-04-28 Thread Matt Mitchell
Hi Christian,

I decided to do something very similar. How do you handle cases where the
highlighting is inside of html/xml tags though? I'm getting stuff like this:

?q=jackson

entry type=song author=Michael emJackson/emBad by Michael
emJackson/em/entry

I wrote a regular expression to take care of the html/xml problem
(highlighting inside of the tag), I'd be interested in seeing your and
others approach to this, even if it's a regular expression.

Matt

On Tue, Apr 28, 2009 at 3:21 AM, Christian Vogler 
christian.vog...@gmail.com wrote:

 Hi Matt,

 On Tue, Apr 28, 2009 at 4:24 AM, Matt Mitchell goodie...@gmail.com
 wrote:
  I've been toying with setting custom pre/post delimiters and then
 removing
  them in the client, but I thought I'd ask the list before I go to far
 with
  that idea :)

 this is what I do. I define the custom highlight delimiters as
 [solr:hl] and [/solr:hl], and then do a string replace with em
 class=highlight /em on the search results.

 It is simple to implement, and effective.

 Best regards
 - Christian



Re: Getting incorrect value while trying to extract content from xlsx

2009-04-28 Thread Erik Hatcher
How are you indexing it?   A sample of the CSV file would be helpful.   
Note that while the CSV update handler is very convenient and very  
fast, it also doesn't have much in the way of data massaging/ 
transformation - so it might require you pre-format the data for Solr  
ingestion, or have a programmatic indexer that does this.


Erik

On Apr 28, 2009, at 4:17 AM, Koushik Mitra wrote:


HI,

I was trying to extract content from an xlsx file for indexing.
However, I am getting julian date value for a cell with date format  
and '1.0' in place of '100%'.

I want to retain the value as present in that xlsx file.

Solution appreciated.

Thanks,
Koushik

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION  
intended solely
for the use of the addressee(s). If you are not the intended  
recipient, please
notify the sender by e-mail and delete the original message.  
Further, you are not
to copy, disclose, or distribute this e-mail or its contents to any  
other person and
any such actions are unlawful. This e-mail may contain viruses.  
Infosys has taken
every reasonable precaution to minimize this risk, but is not liable  
for any damage
you may sustain as a result of any virus in this e-mail. You should  
carry out your
own virus checks before opening the e-mail or attachment. Infosys  
reserves the
right to monitor and review the content of all messages sent to or  
from this e-mail
address. Messages sent to or from this e-mail address may be stored  
on the

Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***




Re: Solr Performance bottleneck

2009-04-28 Thread Otis Gospodnetic

Hi,

You should probably just look at the index version number to figure out if the 
name changed.  If you are looking at segments.gen, you are looking at a file 
that may not exist in Lucene in the future.  Use IndexReader API instead.

By refreshes do you mean reopened a new Searcher?  Does commit + post 
commit event not work for you?

By kicks Solr I hope you don't mean a Solr/container restart! :)

Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Andrey Klochkov akloch...@griddynamics.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, April 28, 2009 4:57:54 AM
 Subject: Re: Solr Performance bottleneck
 
 On Mon, Apr 27, 2009 at 10:27 PM, Jon Bodner wrote:
 
 
  Trying to point multiple Solrs  on multiple boxes at a single shared
  directory is almost certainly doomed to failure; the read-only Solrs won't
  know when the read/write Solr instance has updated the index.
 
 
 I'm solving the same problem while working with index stored in data-grid
 and I've just created a data-grid listener which looks for segments.gen
 file changes and forces Solr to refresh its structures after receiving this
 event. You can do the same job with file system index - write some code
 which looks at segments.gen file changes and kicks solr when a change is
 detected.
 
 It would be great to add such a mechanism to Solr, I mean some abstracted
 (via an interface) way to implement index refresh events sources.
 
 Also there's code in SolrCore which checks index existence by looking into
 file system and it would be better to abstract that code too. WDYT? I can
 provide patches.
 
 -- 
 Andrew Klochkov



Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-04-28 Thread Glen Newton
Amit,

You might want to take a look at LuSql[1] and see if it may be
appropriate for the issues you have.

thanks,

Glen

[1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql

2009/4/27 Amit Nithian anith...@gmail.com:
 All,
 I have a few questions regarding the data import handler. We have some
 pretty gnarly SQL queries to load our indices and our current loader
 implementation is extremely fragile. I am looking to migrate over to the
 DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff
 to remotely load the indices so that my index loader and main search engine
 are separated.
 Currently, unless I am missing something, the data gathering from the entity
 and the data processing (i.e. conversion to a Solr Document) is done
 sequentially and I was looking to make this execute in parallel so that I
 can have multiple threads processing different parts of the resultset and
 loading documents into Solr. Secondly, I need to create temporary tables to
 store results of a few queries and use them later for inner joins was
 wondering how to best go about this?

 I am thinking to add support in DIH for the following:
 1) Temporary tables (maybe call it temporary entities)? --Specific only to
 SQL though unless it can be generalized to other sources.
 2) Parallel support
  - Including some mechanism to get the number of records (whether it be
 count or the MAX(custom_id)-MIN(custom_id))
 3) Support in DIH or Solr to post documents to a remote index (i.e. create a
 new UpdateHandler instead of DirectUpdateHandler2).

 If any of these exist or anyone else is working on this (OR you have better
 suggestions), please let me know.

 Thanks!
 Amit




-- 

-


Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-28 Thread ahmed baseet
Thank you very much. Now its working fine, fixed those minor classpath
issues.

Thanks,
Ahmed.

2009/4/28 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 the Solr distro contains all the jar files. you can take either the
 latest release (1.3) or a nightly

 On Tue, Apr 28, 2009 at 11:34 AM, ahmed baseet ahmed.bas...@gmail.com
 wrote:
  As far as I know, Maven is a build/mgmt tool for java projects quite
 similar
  to Ant, right? No I'm not using this , then I think I don't need to worry
  about those pom files.
  But  I'm still not able to figure out the error with classpath/jar files
 I
  mentioned in my previous mails. Shall I try getting those jar files,
  specifically that solr-solrj jar that contains commons-http-solr-server
  class files? If yes then can you tell me where to get those jar files
 from,
  on the web?  Has anyone ever faced similar problems? Please help me
 fixing
  these silly issues?
 
  Thanks,
  Ahmed.
  On Mon, Apr 27, 2009 at 6:59 PM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
 
  On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet ahmed.bas...@gmail.com
  wrote:
 
   Can anyone help me selecting the proper pom.xml file out of the bunch
 of
   *-pom.xml.templates available.
  
 
  Ahmed, are you using Maven? If not, then you do not need these pom
 files.
  If
  you are using Maven, then you need to add a dependency to solrj.
 
 
 
 http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 



 --
 --Noble Paul



Re: Unique Identifiers

2009-04-28 Thread Erik Hatcher


On Apr 28, 2009, at 9:49 AM, ahammad wrote:

Is it possible for Solr to assign a unique number to every document?


Solr has a UUIDField that can be used for this.  But...


For example, let's say that I am indexing from several databases with
different data structures. The first one has a unique field called  
artID,
and the second database has a unique field called SRNum. If I want  
to have
an interface that allows me to search both of those data sources, it  
makes

it easier to have a single field per document that is common to both
datasources...maybe something like uniqueDocID or something like that.

That field does not exist in the DB. Is it possible for Solr to  
create that

field and assign a number while it's indexing?


I recommend an aggregate unique key field, using maybe this scheme:

   table-name-primary key value'

Erik



Re: Snapinstaller on slave solr server | Can not connect to solr server issue

2009-04-28 Thread payalsharma

To add to that : 

This issue was coming because of the commit script called internally by
snapinstaller . Commit script creates the solr url to do the comit as shown
below:
curl_url=http://${solr_hostname}:${solr_port}/${webapp_name}/update

commitscript logs:

2009/04/28 18:48:21 started by root
2009/04/28 18:48:21 command:
/opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.PUFFIN.CO.UK/bin/commit
2009/04/28 18:48:21 commit request to Solr at
http://delpearsondm:8080/apache-solr-1.3.0/update failed:
2009/04/28 18:48:21 htmlheadtitleApache Tomcat/6.0.18 - Error
report/title/headbodyh1HTTP Status 400 - Missing solr core name in
path/h1HR size=1 noshade=noshadeptype Status report/ppmessage
uMissing solr core name in path/u/ppdescription uThe request sent
by the client was syntactically incorrect (Missing solr core name in
path)./u/pHR size=1 noshade=noshadeh3Apache
Tomcat/6.0.18/h3/body/html
2009/04/28 18:48:21 failed (elapsed time: 0 sec)

Solr server set at our end contains multi cores, thus forms the URL like :
http://servername:8080/apache-solr-1.3.0/CORE_WWW.ABCD.COM/update

The Core name is not getting appended in the commit script.

Please let me know whether I need to change the commit script to accomodate
the core name in URL formed, or there is some alternate way to achieve the
same without modifying the script.


Thanks,
Payal


payalsharma wrote:
 
 Hi All,
 
 I m facing an issue while running snapinstaller script on the Slave
 server, scripts installs the latest snapshot , but creates issue while
 making connectivity to the solr server , logs for the same from
 snapinstaller.log :
 
 2009/04/28 18:48:03 command:
 /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/bin/snapinstaller
 -u webuser
 2009/04/28 18:48:16 installing snapshot
 /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/data/snapshot.20090428180619
 2009/04/28 18:48:21 notifing Solr to open a new Searcher
 2009/04/28 18:48:21 failed to connect to Solr server
 2009/04/28 18:48:21 snapshot installed but Solr server has not open a new
 Searcher
 2009/04/28 18:48:21 failed (elapsed time: 18 sec)
 
 I ensured that slave solr server was in running state before calling ...
 snappuller and snapinstaller scripts.
 
 As a result of this issue Slave server's Collection was not displaying the
 indexes of latest installed snapshot, 
 As a temporary solution,  I restarted the Slave  server and Collection got
 refreshed. 
 
 
 Can anybody let me know the probable reason of this behavior.
 

-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-on-slave-solr-server-%7C-Can-not-connect-to-solr-server-issue-tp23278187p23279140.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Snapinstaller on slave solr server | Can not connect to solr server issue

2009-04-28 Thread payalsharma



To add to that : 

This issue was coming because of the commit script called internally by
snapinstaller . Commit script creates the solr url to do the comit as shown
below:
curl_url=http://${solr_hostname}:${solr_port}/${webapp_name}/update

commitscript logs:

2009/04/28 18:48:21 started by root
2009/04/28 18:48:21 command:
/opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/bin/commit
2009/04/28 18:48:21 commit request to Solr at
http://servername:port/apache-solr-1.3.0/update failed:
2009/04/28 18:48:21 htmlheadtitleApache Tomcat/6.0.18 - Error
report/title/headbodyh1HTTP Status 400 - Missing solr core name in
path/h1HR size=1 noshade=noshadeptype Status report/ppmessage
uMissing solr core name in path/u/ppdescription uThe request sent
by the client was syntactically incorrect (Missing solr core name in
path)./u/pHR size=1 noshade=noshadeh3Apache
Tomcat/6.0.18/h3/body/html
2009/04/28 18:48:21 failed (elapsed time: 0 sec)

Solr server set at our end contains multi cores, thus forms the URL like :
http://servername:8080/apache-solr-1.3.0/CORE_WWW.ABCD.COM/update

The Core name is not getting appended in the commit script.

Please let me know whether I need to change the commit script to accomodate
the core name in URL formed, or there is some alternate way to achieve the
same without modifying the script.


Thanks,
Payal


payalsharma wrote:
 
 Hi All,
 
 I m facing an issue while running snapinstaller script on the Slave
 server, scripts installs the latest snapshot , but creates issue while
 making connectivity to the solr server , logs for the same from
 snapinstaller.log :
 
 2009/04/28 18:48:03 command:
 /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/bin/snapinstaller
 -u webuser
 2009/04/28 18:48:16 installing snapshot
 /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/data/snapshot.20090428180619
 2009/04/28 18:48:21 notifing Solr to open a new Searcher
 2009/04/28 18:48:21 failed to connect to Solr server
 2009/04/28 18:48:21 snapshot installed but Solr server has not open a new
 Searcher
 2009/04/28 18:48:21 failed (elapsed time: 18 sec)
 
 I ensured that slave solr server was in running state before calling ...
 snappuller and snapinstaller scripts.
 
 As a result of this issue Slave server's Collection was not displaying the
 indexes of latest installed snapshot, 
 As a temporary solution,  I restarted the Slave  server and Collection got
 refreshed. 
 
 
 Can anybody let me know the probable reason of this behavior.
 



-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-on-slave-solr-server-%7C-Can-not-connect-to-solr-server-issue-tp23278187p23279184.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: newbie question about indexing RSS feeds with SOLR

2009-04-28 Thread Koji Sekiguchi
Just an FYI: I've never tried, but there seems to be RSS feed sample in DIH:

http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476

Koji

Tom H wrote:
 Hi,

 I've just downloaded solr and got it working, it seems pretty cool.

 I have a project which needs to maintain an index of articles that were
 published on the web via rss feed.

 Basically I need to watch some rss feeds, and search and index the items
 to be searched.

 Additionally, I need to run jobs based on particular keywords or events
 during parsing.

 is this something that I can do with SOLR? are their any related
 projects using SOLR that are better suited to indexing specific xml
 types like RSS?

 I had a look at the project enormo which appears to be a property
 lettings and sales listing aggregator. But I can see that they must have
 solved some of the problems I am thinking of such as scheduled indexing
 of remote resources, and writing a parser to get data fields from some
 other sites templates.

 Any advice would be welcome...

 Many Thanks,

 Tom




   



Re: Can we provide context dependent faceted navigation from SOLR search results

2009-04-28 Thread Koji Sekiguchi

Thanh Doan wrote:

Assuming a solr search returns 10 listing items as below

1) 4 digital cameras
2) 4 LCD televisions
3) 2 clothing items

If we navigate to /electronics  we want solr  to show
us facets specific to 8 electronics items (e.g brand, price).
If we navigate to /electronics/cameraswe want solr  to show us
facets specific to 4 camera items (e.g mega-pixels, screens-size,
brand, price).
If we navigate to /electronics/televisions  we want to see different
facets and their counts specific to TV  items.
If we navigate to /clothing   we want to obtain
totally different facets and their counts.

I am not sure if we can think of this as Hierarchical Facet Navigation
system or not.
From the UI perspective , we can think of /electronics/cameras as
Hierarchical classification.

  

There is a patch for Hierarchical Facet Navigation:

https://issues.apache.org/jira/browse/SOLR-64


But how about electronics/cameras/canon vs electronics/canon/camera.
In this case both navigation should show the same result set no matter
which facet is selected first.

  
The patch supports a document to have multiple hierarchical facet 
fields. for example:


add
 doc
   field name=nameCanon Brand-new Digital Camera/field
   field name=catelectronics/cameras/canon/field
   field name=catelectronics/canon/cameras/field
 /doc
/add


Koji


My question is with the current solr implementation can we  provide
context dependent faceted navigation from SOLR search results?

Thank you.
Thanh Doan

  




Re: spellcheck.collate causes StringIndexOutOfBoundsException during startup.

2009-04-28 Thread Koji Sekiguchi
I see you are using firstSearcher/newSearcher event listener on your 
startup and cause the problem.

If you don't need them, commented out them in solrconfig.xml.

Koji


Eric Sabourin wrote:

I’m using SOLR 1.3.0 (from download, not a  nightly build)

apache-tomcat-5.5.27 on Windows  XP.



When I add str name=spellcheck.collatetrue/str to my requestHandler in
my solrconfig.xml, I get the StringIndexOutOfBoundsException stacktrace
below on startup. Removing the element, or setting it to false, causes the
exception to no longer occur on startup.



Any help is appreciated. Let me know if additional information is required.



Eric



The exception (from logs):

Apr 24, 2009 12:17:53 PM org.apache.solr.servlet.SolrUpdateServlet init

INFO: SolrUpdateServlet.init() done

Apr 24, 2009 12:17:53 PM org.apache.solr.common.SolrException log

SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of
range: -5

   at
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:800)

   at java.lang.StringBuilder.replace(StringBuilder.java:272)

   at
org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:232)

   at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:149)

   at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)

   at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1228)

   at
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:50)

   at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1034)

   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)

   at java.util.concurrent.FutureTask.run(FutureTask.java:123)

   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)

   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)

   at java.lang.Thread.run(Thread.java:595)



Apr 24, 2009 12:17:53 PM org.apache.solr.core.SolrCore execute



Having the following does not cause the exception:

   str name=spellchecktrue/str

   str name=spellcheck.onlyMorePopularfalse/str

   !-- exr = Extended Results --

   str name=spellcheck.extendedResultsfalse/str

   !--  The number of suggestions to return --

   str name=spellcheck.count1/str

   str name=spellcheck.dictionarydefault/str

   !-- comment out collate... causes
java.lang.StringIndexOutOfBoundsException on startup?  --

   !-- str name=spellcheck.collatetrue/str--



With the following the exception occurs on startup.

   str name=spellchecktrue/str

   str name=spellcheck.onlyMorePopularfalse/str

   !-- exr = Extended Results --

   str name=spellcheck.extendedResultsfalse/str

   !--  The number of suggestions to return --

   str name=spellcheck.count1/str

   str name=spellcheck.dictionarydefault/str

   !-- comment out collate... causes
java.lang.StringIndexOutOfBoundsException on startup?  --

   str name=spellcheck.collatetrue/str







  




Re: Can we provide context dependent faceted navigation from SOLR search results

2009-04-28 Thread Matt Mitchell
Wow, this looks great. Thanks for this Koji!

Matt

On Tue, Apr 28, 2009 at 12:13 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 Thanh Doan wrote:

 Assuming a solr search returns 10 listing items as below

 1) 4 digital cameras
 2) 4 LCD televisions
 3) 2 clothing items

 If we navigate to /electronics  we want solr  to show
 us facets specific to 8 electronics items (e.g brand, price).
 If we navigate to /electronics/cameraswe want solr  to show us
 facets specific to 4 camera items (e.g mega-pixels, screens-size,
 brand, price).
 If we navigate to /electronics/televisions  we want to see different
 facets and their counts specific to TV  items.
 If we navigate to /clothing   we want to obtain
 totally different facets and their counts.

 I am not sure if we can think of this as Hierarchical Facet Navigation
 system or not.
 From the UI perspective , we can think of /electronics/cameras as
 Hierarchical classification.



 There is a patch for Hierarchical Facet Navigation:

 https://issues.apache.org/jira/browse/SOLR-64

  But how about electronics/cameras/canon vs electronics/canon/camera.
 In this case both navigation should show the same result set no matter
 which facet is selected first.



 The patch supports a document to have multiple hierarchical facet fields.
 for example:

 add
  doc
   field name=nameCanon Brand-new Digital Camera/field
   field name=catelectronics/cameras/canon/field
   field name=catelectronics/canon/cameras/field
  /doc
 /add


 Koji

  My question is with the current solr implementation can we  provide
 context dependent faceted navigation from SOLR search results?

 Thank you.
 Thanh Doan







Re: fail to create or find snapshoot

2009-04-28 Thread Jian Han Guo
I think this is a bug.

I looked at the classes SnapShooter, and it's constructor looks like this:


public SnapShooter(SolrCore core) {
  solrCore = core;
}

This leaves the variable snapDir to be null, and the variable is never
initialized elsewhere, and later in the function SnapShooter.createSnapshot,
the line

snapShotDir = new File(snapDir, directoryName);

is equivalent to

snapShotDir = new File(directoryName);

because snapDir is null, and therefor the snapshot is created in the
directory where the application is launched. A line should be added to the
contructor like this:


public SnapShooter(SolrCore core) {
  solrCore = core;
  snapDir = core.getDataDir();
}



This is not a problem during development, but it is when you want to deploy
the application to different environments and to schedule snapshot for
backup. Can somebody take a look at this problem?

Thanks,

Jianhan



On Mon, Apr 27, 2009 at 12:02 PM, Jian Han Guo jian...@gmail.com wrote:

 Actually, I found the snapshot in the directory where solr was lauched. Is
 this done on purpose? shouldn't it be in the data directory?

 Thanks,

 Jianhan



 On Mon, Apr 27, 2009 at 11:43 AM, Jian Han Guo jian...@gmail.com wrote:

 Hi,

 According to Solr's wiki page http://wiki.apache.org/solr/SolrReplication,
 if I send the following request to master, a snapshoot will be created

 http://master_host:port/solr/replication?command=snapshoothttp://master_host/solr/replication?command=snapshoot


 But after I did it, nothing seemed happening.

 I got this response back,

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime2/int/lst
 /response

 and I checked the data directory, no snapshoot was created.

 I am not sure what to expect after making the request, and where to find
 the snapshoot files (and what they are).

 Thanks,

 Jianhan









Unable to import data from database

2009-04-28 Thread Ci-man

I am using MS SQL server and want to index a table.
I setup my data-config like this:

dataConfig
dataSource type=JdbcDataSource batchSize=25000 
autoCommit=true 
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
url=jdbc:sqlserver://localhost:1433;databaseName=MYDB 
user= password=/

document name=products
entity name=item query=select TOP 50 * from items
field column=item_id name=id /
field column=itemname name=name /
field column=itemavgbucost name=price /
field column=categoryname name=cat /
field column=itemdesc name=features  /
/entity
/document
/dataConfig


I am unable to load data from database. I always receive 0 document fetched:
lst name=statusMessages
str name=Time Elapsed0:0:12.989/str
str name=Total Requests made to DataSource1/str
str name=Total Rows Fetched0/str
str name=Total Documents Processed0/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2009-04-28 14:37:49/str
/lst

The query runs in SQL Server query manager and retrieves records. The funny
thing is, even if I purposefully write a wrong query with non-existing
tables I get the same response. What am I doing wrong? How can I tell
whether a query fails or succeeds or if solr is running the query in the
first place?

Any help is appreciated.
Best,
-Ci 


-- 
View this message in context: 
http://www.nabble.com/Unable-to-import-data-from-database-tp23283852p23283852.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Performance bottleneck

2009-04-28 Thread Andrey Klochkov
On Tue, Apr 28, 2009 at 3:18 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Hi,

 You should probably just look at the index version number to figure out if
 the name changed.  If you are looking at segments.gen, you are looking at a
 file that may not exist in Lucene in the future.  Use IndexReader API
 instead.


Yeah, I user IndexReader.isCurrent() to determine if I should refresh Solr
after catching a data grid event. But I have to create that event listener
somehow, and here I have no other way but to hardcode this index file name.
So when some node of the cluster performs commit, other nodes which listen
for segments.gen changes, receive the event and refresh their Solr instances
by calling SolrServer.commit().


 By refreshes do you mean reopened a new Searcher?  Does commit + post
 commit event not work for you?


Currently I use the following code to refresh cores:

new EnbeddedlSolrServer(cores, coreName).commit()




 By kicks Solr I hope you don't mean a Solr/container restart! :)


:) No, I mean the same refresh code i.e. calling SolrServer.commit().

-- 
Andrew Klochkov


Re: Unable to import data from database

2009-04-28 Thread ahammad

Did you define all the fields that you used in schema.xml?



Ci-man wrote:
 
 I am using MS SQL server and want to index a table.
 I setup my data-config like this:
 
 dataConfig
   dataSource type=JdbcDataSource batchSize=25000 
   autoCommit=true 
   driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
   url=jdbc:sqlserver://localhost:1433;databaseName=MYDB 
   user= password=/
 
 document name=products
 entity name=item query=select TOP 50 * from items
 field column=item_id name=id /
 field column=itemname name=name /
 field column=itemavgbucost name=price /
   field column=categoryname name=cat /
 field column=itemdesc name=features  /
 /entity
 /document
 /dataConfig
 
 
 I am unable to load data from database. I always receive 0 document
 fetched:
 lst name=statusMessages
 str name=Time Elapsed0:0:12.989/str
 str name=Total Requests made to DataSource1/str
 str name=Total Rows Fetched0/str
 str name=Total Documents Processed0/str
 str name=Total Documents Skipped0/str
 str name=Full Dump Started2009-04-28 14:37:49/str
 /lst
 
 The query runs in SQL Server query manager and retrieves records. The
 funny thing is, even if I purposefully write a wrong query with
 non-existing tables I get the same response. What am I doing wrong? How
 can I tell whether a query fails or succeeds or if solr is running the
 query in the first place?
 
 Any help is appreciated.
 Best,
 -Ci 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Unable-to-import-data-from-database-tp23283852p23284381.html
Sent from the Solr - User mailing list archive at Nabble.com.



Multiple Queries

2009-04-28 Thread Ankush Goyal
Hi,

I have been trying to solve a performance issue: I have an index of hotels with 
their ids and another index of reviews. Now, when someone queries for a 
location, the current process gets all the hotels for that location.
And, then corresponding to each hotel-id from all the hotel documents, it calls 
the review index to fetch reviews associated with that particular hotel and so 
on it repeats for all the hotels. This process slows down the request 
significantly.
I need to accumulate reviews according to corresponding hotel-ids, so I can't 
just fetch all the reviews for all the hotel ids and show them. Now, I was 
thinking about fetching all the reviews for all the hotel-ids and then parse 
all those reviews in one go and create a map with hotel-id as key and list of 
reviews as values.

Can anyone comment on whether this procedure would be better or worse, or if 
there's better way of doing this?

--Ankush Goyal


Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-04-28 Thread Amit Nithian
I do remember LuSQL and a discussion regarding the performance implications
of using it compared to the DIH. My only reason to stick with DIH is that we
may have other data sources for document loading in the near term that may
make LuSQL too specific for our needs.

Regarding the bug to write to the index in a separate thread, while helpful,
doesn't address my use case which is as follows:
1) Write a loader application using EmbeddedSolr + SolrJ + DIH (create a
bogus local request with path='/dataimport') so that the DIH code is invoked
2) Instead of using DirectUpdate2 update handler, write a custom update
handler to take a solr document and POST to a remote Solr server. I could
queue documents here and POST in bulk but that's details..
3) Possibly multi-thread the DIH so that multiple threads can process
different database segments, construct and POST solr documents.
  - For example, thread 1 processes IDs 1-100, thread 2, 101-200, thread 3,
201-...
  - If the Solr Server is multithreaded in writing to the index, that's
great and helps in performance.

#3 is possible depending on performance tests. #1 and #2 I believe I need
because I want my loader separated from the master server for development,
deployment and just general separation of concerns.

Thanks
Amit

On Tue, Apr 28, 2009 at 6:03 AM, Glen Newton glen.new...@gmail.com wrote:

 Amit,

 You might want to take a look at LuSql[1] and see if it may be
 appropriate for the issues you have.

 thanks,

 Glen

 [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql

 2009/4/27 Amit Nithian anith...@gmail.com:
  All,
  I have a few questions regarding the data import handler. We have some
  pretty gnarly SQL queries to load our indices and our current loader
  implementation is extremely fragile. I am looking to migrate over to the
  DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom
 stuff
  to remotely load the indices so that my index loader and main search
 engine
  are separated.
  Currently, unless I am missing something, the data gathering from the
 entity
  and the data processing (i.e. conversion to a Solr Document) is done
  sequentially and I was looking to make this execute in parallel so that I
  can have multiple threads processing different parts of the resultset and
  loading documents into Solr. Secondly, I need to create temporary tables
 to
  store results of a few queries and use them later for inner joins was
  wondering how to best go about this?
 
  I am thinking to add support in DIH for the following:
  1) Temporary tables (maybe call it temporary entities)? --Specific only
 to
  SQL though unless it can be generalized to other sources.
  2) Parallel support
   - Including some mechanism to get the number of records (whether it be
  count or the MAX(custom_id)-MIN(custom_id))
  3) Support in DIH or Solr to post documents to a remote index (i.e.
 create a
  new UpdateHandler instead of DirectUpdateHandler2).
 
  If any of these exist or anyone else is working on this (OR you have
 better
  suggestions), please let me know.
 
  Thanks!
  Amit
 



 --

 -



RE: facet with group by (or field collapsing)

2009-04-28 Thread Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
I began a similar thread under the subject Distinct terms in facet field.

One thing I noticed though is that your fields seem to have a lot of controlled 
values, or lack free text.  Are you sure SOLR is what you should be using?  
Perhaps a traditional RDB would be better and then you would have GROUP BY 
and aggregate functions at your disposal...

HTH,
Tim

-Original Message-
From: Qingdi [mailto:liuqin...@yahoo.com] 
Sent: Tuesday, April 28, 2009 1:07 PM
To: solr-user@lucene.apache.org
Subject: facet with group by (or field collapsing)


Hi,

Is it possible to group the search result on certain field and then do facet
counting?

For example, the index is defined with the following fields:
Kid_Id, Family_Id, Age, School, Favorite_Sports (MultiValue Field)

We want to query with Age between 10 yrs to 12 yrs and School in (School_A,
School_B), and do faceting on Favorite_Sports. But instead of showing the
count of kids for each sport, we want to show the count of Families.

Each family can have multiple kids. How to group the search result on
Family_Id, and then do faceting on Favorite_Sports?

Appreciate your help.

Qingdi


-- 
View this message in context: 
http://www.nabble.com/facet-with-group-by-%28or-field-collapsing%29-tp23285038p23285038.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple Queries

2009-04-28 Thread Erick Erickson
Have you considered indexing the reviews along with the hotels right
in the hotel index? That way you would fetch the reviews right along with
the hotels...

Really, this is another way of saying flatten your data G...

Your idea of holding all the hotel reviews in memory is also viable,
depending upon
how many there are. you'd pay some startup costs, but that's what caching is
all
about.

Given your current index structure, have you tried collecting the hotel IDs,
and
submitting a query to your review index that just ORs together all the IDs
and
then parsing that rather than calling your review index for one hotel ID at
a time?

Best
Erick

On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal ankush.go...@orbitz.comwrote:

 Hi,

 I have been trying to solve a performance issue: I have an index of hotels
 with their ids and another index of reviews. Now, when someone queries for a
 location, the current process gets all the hotels for that location.
 And, then corresponding to each hotel-id from all the hotel documents, it
 calls the review index to fetch reviews associated with that particular
 hotel and so on it repeats for all the hotels. This process slows down the
 request significantly.
 I need to accumulate reviews according to corresponding hotel-ids, so I
 can't just fetch all the reviews for all the hotel ids and show them. Now, I
 was thinking about fetching all the reviews for all the hotel-ids and then
 parse all those reviews in one go and create a map with hotel-id as key and
 list of reviews as values.

 Can anyone comment on whether this procedure would be better or worse, or
 if there's better way of doing this?

 --Ankush Goyal



Re: Can we provide context dependent faceted navigation from SOLR search results

2009-04-28 Thread Thanh Doan
After posting this question I found this discussion
http://www.nabble.com/Hierarchical-Facets--to7135353.html.

So what I did was adapting the scheme with 3 fields; cat,
subcat,subsubcat and hardcoded the hierarchical  logic in the UI layer
to present hierarchical taxonomy for the users.

The users still see somewhat similar to this page
http://www.overstock.com/Electronics/Digital-Cameras/Canon,/brand,/813/cat.html

But I have to say that hardcoding the hierarchical logic in UI layer is messy.

It looks like Koji patch will be a much better solution.

Thanks Koji!

Thanh

On Tue, Apr 28, 2009 at 11:27 AM, Matt Mitchell goodie...@gmail.com wrote:
 Wow, this looks great. Thanks for this Koji!

 Matt

 On Tue, Apr 28, 2009 at 12:13 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 Thanh Doan wrote:

 Assuming a solr search returns 10 listing items as below

 1) 4 digital cameras
 2) 4 LCD televisions
 3) 2 clothing items

 If we navigate to /electronics                  we want solr  to show
 us facets specific to 8 electronics items (e.g brand, price).
 If we navigate to /electronics/cameras    we want solr  to show us
 facets specific to 4 camera items (e.g mega-pixels, screens-size,
 brand, price).
 If we navigate to /electronics/televisions  we want to see different
 facets and their counts specific to TV  items.
 If we navigate to /clothing                       we want to obtain
 totally different facets and their counts.

 I am not sure if we can think of this as Hierarchical Facet Navigation
 system or not.
 From the UI perspective , we can think of /electronics/cameras as
 Hierarchical classification.



 There is a patch for Hierarchical Facet Navigation:

 https://issues.apache.org/jira/browse/SOLR-64

  But how about electronics/cameras/canon vs electronics/canon/camera.
 In this case both navigation should show the same result set no matter
 which facet is selected first.



 The patch supports a document to have multiple hierarchical facet fields.
 for example:

 add
  doc
   field name=nameCanon Brand-new Digital Camera/field
   field name=catelectronics/cameras/canon/field
   field name=catelectronics/canon/cameras/field
  /doc
 /add


 Koji

  My question is with the current solr implementation can we  provide
 context dependent faceted navigation from SOLR search results?

 Thank you.
 Thanh Doan









-- 
Regards,
Thanh Doan
713-884-0576
http://datamatter.blogspot.com/


Re: MacOS Failed to initialize DataSource:db+ DataimportHandler ???

2009-04-28 Thread gateway0

That didn´t work either.

All my libraries are at /Applications/tomcat/webapps/solr/WEB-INF/lib
So is apache-solr-dataimporthandler-1.3.0.jar

However I did create a new /lib directory under my solr home at
/Applications/solr and copied the jar to that location as well.
But no difference.

Here is my entry for the dataimporthandler in solrconfig.xml
(path:/Applications/solr/conf):

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
  str name=config/Applications/solr/conf/data-config.xml/str
/lst
  /requestHandler




Noble Paul നോബിള്‍  नोब्ळ् wrote:
 
 apparently you do not have the driver in the path. drop your driver
 jar into ${solr.home}/lib
 
 On Tue, Apr 28, 2009 at 4:42 AM, gateway0 reiterwo...@yahoo.de wrote:

 Hi,

 sure:
 
 message Severe errors in solr configuration. Check your log files for
 more
 detailed information on what may be wrong. If you want solr to continue
 after configuration errors, change:
 abortOnConfigurationErrorfalse/abortOnConfigurationError in null
 -
 org.apache.solr.common.SolrException: FATAL: Could not create importer.
 DataImporter config invalid at
 org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
 at
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:480) at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
 at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4363)
 at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
 at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525)
 at
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627)
 at
 org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
 at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488)
 at
 org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
 at
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
 at
 org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at
 org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
 org.apache.catalina.core.StandardService.start(StandardService.java:516)
 at
 org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at
 org.apache.catalina.startup.Catalina.start(Catalina.java:578) at
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:585) at
 org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at
 org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Caused by:
 org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to
 initialize DataSource: mydb Processing Document # at
 org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:308)
 at
 org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273)
 at
 org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228)
 at
 org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:98)
 at
 org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
 ... 31 more Caused by: org.apache.solr.common.SolrException: Could not
 load
 driver: com.mysql.jdbc.Driver at
 org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:112)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:65)
 at
 org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306)
 ... 35 more Caused by: java.lang.ClassNotFoundException: Unable to load
 com.mysql.jdbc.Driver or
 org.apache.solr.handler.dataimport.com.mysql.jdbc.Driver at
 org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587)
 at
 

RE: facet with group by (or field collapsing)

2009-04-28 Thread Qingdi

Hi Tim,

Thanks for your reply. The index structure in my original post is just an
example. We do have many free text fields with different analyzers.

I checked your post Distinct terms in facet field, but I think the issues
we try to address are different. Yours is to get distinct terms in the facet
field, but what I want is to count the distinct values on the non-facet
field.

Since the facet results are much smaller than the query result, you could
get all the facets and count by yourself. But in my case, if I count by
myself, I have to get all the query results, and then count on distinct
family_id for each facet value.

Thanks.

Qingdi


Harsch, Timothy J. (ARC-SC)[LOCKHEED MARTIN SPACE OPNS] wrote:
 
 I began a similar thread under the subject Distinct terms in facet
 field.
 
 One thing I noticed though is that your fields seem to have a lot of
 controlled values, or lack free text.  Are you sure SOLR is what you
 should be using?  Perhaps a traditional RDB would be better and then you
 would have GROUP BY and aggregate functions at your disposal...
 
 HTH,
 Tim
 
 -Original Message-
 From: Qingdi [mailto:liuqin...@yahoo.com] 
 Sent: Tuesday, April 28, 2009 1:07 PM
 To: solr-user@lucene.apache.org
 Subject: facet with group by (or field collapsing)
 
 
 Hi,
 
 Is it possible to group the search result on certain field and then do
 facet
 counting?
 
 For example, the index is defined with the following fields:
 Kid_Id, Family_Id, Age, School, Favorite_Sports (MultiValue Field)
 
 We want to query with Age between 10 yrs to 12 yrs and School in
 (School_A,
 School_B), and do faceting on Favorite_Sports. But instead of showing the
 count of kids for each sport, we want to show the count of Families.
 
 Each family can have multiple kids. How to group the search result on
 Family_Id, and then do faceting on Favorite_Sports?
 
 Appreciate your help.
 
 Qingdi
 
 
 -- 
 View this message in context:
 http://www.nabble.com/facet-with-group-by-%28or-field-collapsing%29-tp23285038p23285038.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/facet-with-group-by-%28or-field-collapsing%29-tp23285038p23287434.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: WordDelimiterFilterFactory removes words when options set to 0

2009-04-28 Thread Chris Hostetter

: In trying to understand the various options for 
: WordDelimiterFilterFactory, I tried setting all options to 0. This seems 
: to prevent a number of words from being output at all. In particular 
: can't and 99dxl don't get output, nor do any wods containing hypens. 
: Is this correct behavior?

For the record: there are other options you haven't set... splitOnNumerics 
defaults to 1; preserveOriginal defaults to 0 ... i'm guessing if you 
set splitOnNumerics=0 you'd see a lot more tokens come through, and if 
you set preserveOriginal=1 you'd definitely see a lot more tokens come 
through my default.

: fieldtype name=mbooksOcrXPatLike class=solr.TextField
:   analyzer
:   tokenizer class=solr.WhitespaceTokenizerFactory/
:   filter class=solr.WordDelimiterFilterFactory
: splitOnCaseChange=0
: generateWordParts=0
: generateNumberParts=0
:   catenateWords=0
: catenateNumbers=0
: catenateAll=0
: /
:   filter class=solr.LowerCaseFilterFactory/
:   /analyzer
: /fieldtype

-Hoss



Re: half width katakana

2009-04-28 Thread Chris Hostetter

: The exception is expected if you use CharStream aware Tokenizer without
: CharFilters.

Koji: i thought all of the casts had been eliminated and replaced with 
a call to CharReader.get(Reader) ?

: Please see example/solr/conf/schema.xml for the setting of CharFilter and
: CharStreamAware*Tokenizer:


:  Using CharStreamAwareCJKTokenizerFactory is giving me following error,
:  SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to
:  org.apache.solr.analysis.CharStream
:  
:  May be you are typecasting Reader to subclass.

-Hoss



RE: fl parameter

2009-04-28 Thread Chris Hostetter

: Anyone able to help with the question below?

dealing with fl is a delicate dance in Solr right now .. complicated by 
both FieldSelector logic and distributed search (where both DocList and 
SolrDocumentList objects need to be dealt with).  

I looked at this recently and even I can't remember what does what at the 
moment ... i think you can do what you want just by writing a 
QueryResponseWriter, but it might also be possible to do it as a 
SearchComponent that prunes any SolrDocumentList objects and actaullizes 
any DocList objects using just the fields you want. 

The way to be sure is to look for all uses of CommonParams.FL in the code 
base.

: Yonik, I couldn't find the issues you speak of can you point me in the right 
direction?

http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams



-Hoss



Re: half width katakana

2009-04-28 Thread Koji Sekiguchi

Chris Hostetter wrote:

: The exception is expected if you use CharStream aware Tokenizer without
: CharFilters.

Koji: i thought all of the casts had been eliminated and replaced with 
a call to CharReader.get(Reader) ?


  

Yeah, right. After r758137, ClassCastException should be eliminated.

http://svn.apache.org/viewvc?view=revrevision=758137

And then CharReader.get(Reader) idiom added as hoss suggested:

http://svn.apache.org/viewvc?view=revrevision=758161

Ashish, what revision/nightly version did you use when you got ClassCast 
Exception?


Koji




field type for serialized code?

2009-04-28 Thread Matt Mitchell
Hi,

I'm attempting to serialize a simple ruby object into a solr.StrField - but
it seems that what I'm getting back is munged up a bit, in that I can't
de-serialize it. Is there a field type for doing this type of thing?

Thanks,
Matt


Re: Multiple Queries

2009-04-28 Thread Amit Nithian
Ankush,
It seems that unless reviews are changing constantly, why not do what Erick
was saying in flattening your data by storing reviews with the hotel index
but re-index your hotels storing the top two reviews. I guess I am
suggesting computing the top two reviews for each hotel offline and store
them somewhere.

You could store the top two reviews in an RDBMS and let whatever front end
you have retrieve the top two from the RDBMS after receiving results from
Solr based on your unique ID.

HTH
Amit

On Tue, Apr 28, 2009 at 3:14 PM, Ankush Goyal ankush.go...@orbitz.comwrote:

 Hi Erick,

 Thanks for response!...the solution I was talking about was same as your
 last solution to get reviews for only required hotel-ids and then parsing
 them in one go to make a hash-map, I guess I didn't explain correctly :)

 As far as putting reviews inside the hotel index is concerned, we thought
 about that solution, but we also need to sort the reviews and (let's say)
 show top 2 of maybe 50 reviews for a hotel, so we couldn't put reviews
 inside hotel doc itself.

 Now, this again poses another question for the solution we talked about-,
 as it seems like getting reviews for required hotel-ids and then making a
 hash-map corresponding to hotel-ids can improve the performance, but then we
 also need to sort all the reviews for each hotel using a field/ score in the
 review-doc itself, which seems like would lower down the performance
 drastically.

 Any ideas on a better solution?

 Thanks!
 -Ankush

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, April 28, 2009 4:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Multiple Queries

 Have you considered indexing the reviews along with the hotels right
 in the hotel index? That way you would fetch the reviews right along with
 the hotels...

 Really, this is another way of saying flatten your data G...

 Your idea of holding all the hotel reviews in memory is also viable,
 depending upon
 how many there are. you'd pay some startup costs, but that's what caching
 is
 all
 about.

 Given your current index structure, have you tried collecting the hotel
 IDs,
 and
 submitting a query to your review index that just ORs together all the IDs
 and
 then parsing that rather than calling your review index for one hotel ID at
 a time?

 Best
 Erick

 On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal ankush.go...@orbitz.com
 wrote:

  Hi,
 
  I have been trying to solve a performance issue: I have an index of
 hotels
  with their ids and another index of reviews. Now, when someone queries
 for a
  location, the current process gets all the hotels for that location.
  And, then corresponding to each hotel-id from all the hotel documents, it
  calls the review index to fetch reviews associated with that particular
  hotel and so on it repeats for all the hotels. This process slows down
 the
  request significantly.
  I need to accumulate reviews according to corresponding hotel-ids, so I
  can't just fetch all the reviews for all the hotel ids and show them.
 Now, I
  was thinking about fetching all the reviews for all the hotel-ids and
 then
  parse all those reviews in one go and create a map with hotel-id as key
 and
  list of reviews as values.
 
  Can anyone comment on whether this procedure would be better or worse, or
  if there's better way of doing this?
 
  --Ankush Goyal
 



Re: how to reset the index in solr

2009-04-28 Thread Geetha

Thank you Erik..

Should I write the below code in rake task /lib/tasks/solr.rake?

I am newbie to ruby.


Erik Hatcher wrote:


On Apr 24, 2009, at 1:54 AM, sagi4 wrote:

Can i get the rake task for clearing the index of solr, I mean rake
index::rebuild, It would be very helpful and also to avoid the 
delete id by

manually.


How do you currently build your index?

But making a Rake task to do perform Solr operations is generally 
pretty trivial.  In Ruby (after gem install solr-ruby):


   require 'solr'
   solr = Solr::Connection.new(http://localhost:8983/solr;)
   solr.optimize  # for example

Erik








Re: Multiple Queries

2009-04-28 Thread Avlesh Singh
Ankush,
Your approach works. Fire a in query on the review index for all hotel ids
you care about. Create a map of hotel to its reviews.

Cheers
Avlesh

On Wed, Apr 29, 2009 at 8:09 AM, Amit Nithian anith...@gmail.com wrote:

 Ankush,
 It seems that unless reviews are changing constantly, why not do what Erick
 was saying in flattening your data by storing reviews with the hotel index
 but re-index your hotels storing the top two reviews. I guess I am
 suggesting computing the top two reviews for each hotel offline and store
 them somewhere.

 You could store the top two reviews in an RDBMS and let whatever front end
 you have retrieve the top two from the RDBMS after receiving results from
 Solr based on your unique ID.

 HTH
 Amit

 On Tue, Apr 28, 2009 at 3:14 PM, Ankush Goyal ankush.go...@orbitz.com
 wrote:

  Hi Erick,
 
  Thanks for response!...the solution I was talking about was same as your
  last solution to get reviews for only required hotel-ids and then parsing
  them in one go to make a hash-map, I guess I didn't explain correctly :)
 
  As far as putting reviews inside the hotel index is concerned, we thought
  about that solution, but we also need to sort the reviews and (let's say)
  show top 2 of maybe 50 reviews for a hotel, so we couldn't put reviews
  inside hotel doc itself.
 
  Now, this again poses another question for the solution we talked about-,
  as it seems like getting reviews for required hotel-ids and then making a
  hash-map corresponding to hotel-ids can improve the performance, but then
 we
  also need to sort all the reviews for each hotel using a field/ score in
 the
  review-doc itself, which seems like would lower down the performance
  drastically.
 
  Any ideas on a better solution?
 
  Thanks!
  -Ankush
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Tuesday, April 28, 2009 4:05 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Multiple Queries
 
  Have you considered indexing the reviews along with the hotels right
  in the hotel index? That way you would fetch the reviews right along with
  the hotels...
 
  Really, this is another way of saying flatten your data G...
 
  Your idea of holding all the hotel reviews in memory is also viable,
  depending upon
  how many there are. you'd pay some startup costs, but that's what caching
  is
  all
  about.
 
  Given your current index structure, have you tried collecting the hotel
  IDs,
  and
  submitting a query to your review index that just ORs together all the
 IDs
  and
  then parsing that rather than calling your review index for one hotel ID
 at
  a time?
 
  Best
  Erick
 
  On Tue, Apr 28, 2009 at 4:32 PM, Ankush Goyal ankush.go...@orbitz.com
  wrote:
 
   Hi,
  
   I have been trying to solve a performance issue: I have an index of
  hotels
   with their ids and another index of reviews. Now, when someone queries
  for a
   location, the current process gets all the hotels for that location.
   And, then corresponding to each hotel-id from all the hotel documents,
 it
   calls the review index to fetch reviews associated with that particular
   hotel and so on it repeats for all the hotels. This process slows down
  the
   request significantly.
   I need to accumulate reviews according to corresponding hotel-ids, so I
   can't just fetch all the reviews for all the hotel ids and show them.
  Now, I
   was thinking about fetching all the reviews for all the hotel-ids and
  then
   parse all those reviews in one go and create a map with hotel-id as key
  and
   list of reviews as values.
  
   Can anyone comment on whether this procedure would be better or worse,
 or
   if there's better way of doing this?
  
   --Ankush Goyal
  
 



Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-04-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
writing to a remote Solr through SolrJ is in the cards. I may even
take it up after 1.4 release. For now your best bet is to override the
class SolrWriter and override the corresponding methods for
add/delete.

On Wed, Apr 29, 2009 at 2:06 AM, Amit Nithian anith...@gmail.com wrote:
 I do remember LuSQL and a discussion regarding the performance implications
 of using it compared to the DIH. My only reason to stick with DIH is that we
 may have other data sources for document loading in the near term that may
 make LuSQL too specific for our needs.

 Regarding the bug to write to the index in a separate thread, while helpful,
 doesn't address my use case which is as follows:
 1) Write a loader application using EmbeddedSolr + SolrJ + DIH (create a
 bogus local request with path='/dataimport') so that the DIH code is invoked
 2) Instead of using DirectUpdate2 update handler, write a custom update
 handler to take a solr document and POST to a remote Solr server. I could
 queue documents here and POST in bulk but that's details..
 3) Possibly multi-thread the DIH so that multiple threads can process
 different database segments, construct and POST solr documents.
  - For example, thread 1 processes IDs 1-100, thread 2, 101-200, thread 3,
 201-...
  - If the Solr Server is multithreaded in writing to the index, that's
 great and helps in performance.

 #3 is possible depending on performance tests. #1 and #2 I believe I need
 because I want my loader separated from the master server for development,
 deployment and just general separation of concerns.

 Thanks
 Amit

 On Tue, Apr 28, 2009 at 6:03 AM, Glen Newton glen.new...@gmail.com wrote:

 Amit,

 You might want to take a look at LuSql[1] and see if it may be
 appropriate for the issues you have.

 thanks,

 Glen

 [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql

 2009/4/27 Amit Nithian anith...@gmail.com:
  All,
  I have a few questions regarding the data import handler. We have some
  pretty gnarly SQL queries to load our indices and our current loader
  implementation is extremely fragile. I am looking to migrate over to the
  DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom
 stuff
  to remotely load the indices so that my index loader and main search
 engine
  are separated.
  Currently, unless I am missing something, the data gathering from the
 entity
  and the data processing (i.e. conversion to a Solr Document) is done
  sequentially and I was looking to make this execute in parallel so that I
  can have multiple threads processing different parts of the resultset and
  loading documents into Solr. Secondly, I need to create temporary tables
 to
  store results of a few queries and use them later for inner joins was
  wondering how to best go about this?
 
  I am thinking to add support in DIH for the following:
  1) Temporary tables (maybe call it temporary entities)? --Specific only
 to
  SQL though unless it can be generalized to other sources.
  2) Parallel support
   - Including some mechanism to get the number of records (whether it be
  count or the MAX(custom_id)-MIN(custom_id))
  3) Support in DIH or Solr to post documents to a remote index (i.e.
 create a
  new UpdateHandler instead of DirectUpdateHandler2).
 
  If any of these exist or anyone else is working on this (OR you have
 better
  suggestions), please let me know.
 
  Thanks!
  Amit
 



 --

 -





-- 
--Noble Paul


Re: how to reset the index in solr

2009-04-28 Thread Geetha
I need a function (through solr ruby) for ruby that will allow us to 
clear everything


regards,
Sg..

Geetha wrote:

Thank you Erik..

Should I write the below code in rake task /lib/tasks/solr.rake?

I am newbie to ruby.


Erik Hatcher wrote:


On Apr 24, 2009, at 1:54 AM, sagi4 wrote:

Can i get the rake task for clearing the index of solr, I mean rake
index::rebuild, It would be very helpful and also to avoid the 
delete id by

manually.


How do you currently build your index?

But making a Rake task to do perform Solr operations is generally 
pretty trivial.  In Ruby (after gem install solr-ruby):


   require 'solr'
   solr = Solr::Connection.new(http://localhost:8983/solr;)
   solr.optimize  # for example

Erik










--

Best Regards,
** 
*Geetha  S *| System and Software Engineer

email: gee...@angleritech.com mailto:gee...@angleritech.com

*
*


*
*Visit us at **Internet World, UK** (**28th-30th Apr 2009**)*
*Click here for FREE TICKETS:** 
*http://www.angleritech.com/company/latest-technology-news-events.html

*



*ANGLER Technologies India - Your Offshore Development Partner -- An ISO 
9001 Company*


Contact us for your high quality Software Outsourcing 
http://www.angleritech.com/offshore/outsourced_product_development.html, 
E-Business Products http://www.angleritech.com/ebusiness/index.html 
and Design Solutions http://www.angleritech.com/design/index.html


/* */

web :www.angleritech.com http://www.angleritech.com/
tel   :+91 422 2312707, 2313938
fax   :+91 422 2313936
address   :**1144 Trichy Road, Coimbatore, 641045, India

offices http://www.angleritech.com/contact/index.html_ _ :  
 India | USA | UK | Canada | Europe | UAE | South Africa | Singapore | 
Hong Kong


* *

*Disclaimer: *The information in the email, files and communication are 
strictly confidential and meant for the intended recipients. It may 
contain proprietary information. If you are not an intended recipient; 
any form of disclosure, copyright, distribution and any other means of 
use of information is unauthorised and subject to legal implications. We 
do not accept any liability for the transmission of incomplete, delayed 
communication and recipients must check this email and any attachments 
for the presence of viruses before downloading them.








Re: field type for serialized code?

2009-04-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
is the serialized data in UTF-8 string?

On Wed, Apr 29, 2009 at 6:42 AM, Matt Mitchell goodie...@gmail.com wrote:
 Hi,

 I'm attempting to serialize a simple ruby object into a solr.StrField - but
 it seems that what I'm getting back is munged up a bit, in that I can't
 de-serialize it. Is there a field type for doing this type of thing?

 Thanks,
 Matt




-- 
--Noble Paul