date:20091102

Re: Greater-than and less-than in data import SQL queries

2009-11-02 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Mon, Nov 2, 2009 at 11:34 AM, Amit Nithian anith...@gmail.com wrote:
 A thought I had on this from a DIH design perspective. Would it be better to
 have the SQL queries stored in an element rather than an attribute so that
 you can wrap it in a CDATA block without having to mess up the look of query
 with lt, gt? Makes debugging easier (I know find and replace is trivial
 but it can be annoying when debugging SQL issues :-)).

Actually most of the parsers are forgiving in this aspect. I mean ''
and '' are ok in the xml parser shipped with the jdk.


 On Wed, Oct 28, 2009 at 5:15 PM, Lance Norskog goks...@gmail.com wrote:

 It is easier to put SQL select statements in a view, and just use that
 view from the DIH configuration file.

 On Tue, Oct 27, 2009 at 12:30 PM, Andrew Clegg andrew.cl...@gmail.com
 wrote:
 
 
  Heh, eventually I decided
 
  where 4  node_depth
 
  was the most pleasing (if slightly WTF-ish) way of writing it...
 
  Cheers,
 
  Andrew.
 
 
  Erik Hatcher-4 wrote:
 
  Use lt; instead of  in that attribute.  That should fix the issue.
  Remember, it's an XML file, so it has to obey XML encoding rules which
  make it ugly but whatcha gonna do?
 
        Erik
 
  On Oct 27, 2009, at 11:50 AM, Andrew Clegg wrote:
 
 
  Hi,
 
  If I have a DataImportHandler query with a greater-than sign in,
  like this:
 
         entity name=higher_node dataSource=database
  query=select *,
  title as keywords from cathnode_text where node_depth  4
 
  Everything's fine. However, if it contains a less-than sign:
 
         entity name=higher_node dataSource=database
  query=select *,
  title as keywords from cathnode_text where node_depth  4
 
  I get this exception:
 
  INFO: Processing configuration from solrconfig.xml:
  {config=dataconfig.xml}
  [Fatal Error] :240:129: The value of attribute query associated
  with an
  element type null must not contain the '' character.
  27-Oct-2009 15:30:49
  org.apache.solr.handler.dataimport.DataImportHandler
  inform
  SEVERE: Exception while loading DataImporter
  org.apache.solr.handler.dataimport.DataImportHandlerException:
  Exception
  occurred while initializing context
         at
  org
  .apache
  .solr
  .handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:184)
         at
  org
  .apache
  .solr.handler.dataimport.DataImporter.init(DataImporter.java:101)
         at
  org
  .apache
  .solr
  .handler.dataimport.DataImportHandler.inform(DataImportHandler.java:
  113)
         at
  org
  .apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:
  424)
         at org.apache.solr.core.SolrCore.init(SolrCore.java:588)
         at
  org.apache.solr.core.CoreContainer
  $Initializer.initialize(CoreContainer.java:137)
         at
  org
  .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:
  83)
         at
  org
  .apache
  .catalina
  .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:
  275)
         at
  org
  .apache
  .catalina
  .core
  .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:
  397)
         at
  org
  .apache
  .catalina
  .core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
         at
  org
  .apache
  .catalina.core.StandardContext.filterStart(StandardContext.java:3709)
         at
  org.apache.catalina.core.StandardContext.start(StandardContext.java:
  4356)
         at
  org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:
  1244)
         at
  org
  .apache
  .catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:
  604)
         at
  org
  .apache
  .catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:
  129)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
         at
  org
  .apache
  .catalina
  .core
  .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:
  290)
         at
  org
  .apache
  .catalina
  .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
         at
  org
  .apache
  .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
  233)
         at
  org
  .apache
  .catalina.core.StandardContextValve.invoke(StandardContextValve.java:
  175)
         at
  org
  .apache
  .catalina
  .authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
         at
  org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:
  568)
         at
  org
  .apache
  .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
         at
  org
  .apache
  .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
         at
  org
  .apache
  .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:
  109)
         at
  org
  .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
  286)
         at
  org
  .apache.coyote.http11.Http11Processor.process(Http11Processor.java:
  844)
         at

Problems downloading lucene 2.9.1

2009-11-02 Thread Licinio Fernández Maurelo

Hi folks,

as we are using an snapshot dependecy to solr1.4, today we are getting
problems when maven try to download lucene 2.9.1 (there isn't a any 2.9.1
there).

Which repository can i use to download it?

Thx

-- 
Lici

RE: CPU utilization and query time high on Solr slave when snapshot install

2009-11-02 Thread biku...@sapient.com

Hi Solr Gurus,

We have solr in 1 master, 2 slave configuration. Snapshot is created post 
commit, post optimization. We have autocommit after 50 documents or 5 minutes. 
Snapshot puller runs as a cron every 10 minutes. What we have observed is that 
whenever snapshot is installed on the slave, we see solrj client used to query 
slave solr, gets timedout and there is high CPU usage/load avg. on slave 
server. If we stop snapshot puller, then slaves work with no issues. The system 
has been running since 2 months and this issue has started to occur only now  
when load on website is increasing.

Following are some details:

Solr Details:
apache-solr Version: 1.3.0
Lucene - 2.4-dev

Master/Slave configurations:

Master:
- for indexing data HTTPRequests are made on Solr server.
- autocommit feature is enabled for 50 docs and 5 minutes
- caching params are disable for this server
- mergeFactor of 10 is set
- we were running optimize script after every 2 hours, but now have reduced the 
duration to twice a day but issue still persists

Slave1/Slave2:
- standard requestHandler is being used
- default values of caching are set
Machine Specifications:

Master:
- 4GB RAM
- 1GB JVM Heap memory is allocated to Solr

Slave1/Slave2:
- 4GB RAM
- 2GB JVM Heap memory is allocated to Solr

Master and Slave1 (solr1)are on single box and Slave2(solr2) on different box. 
We use HAProxy to load balance query requests between 2 slaves. Master is only 
used for indexing.
Please let us know if somebody has ever faced similar kind of issue or has some 
insight into it as we guys are literally struck at the moment with a very 
unstable production environment.

As a workaround, we have started running optimize on master every 7 minutes. 
This seems to have reduced the severity of the problem but still issue occurs 
every 2days now. please suggest what could be the root cause of this.

Thanks,
Bipul

Re: Indexing multiple entities

2009-11-02 Thread Chantal Ackermann


I'm using a code generator for my entities, and I cannot modify the generation.
I need to work out another option :(


shouldn't code generators help development and not make it more complex 
and difficult? oO


(sry off topic)

chantal

Re: StreamingUpdateSolrServer - indexing process stops in a couple of hours

2009-11-02 Thread Shalin Shekhar Mangar

I'm able to reproduce this issue consistently using JDK 1.6.0_16

After an optimize is called, only one thread keeps adding documents and the
rest wait on StreamingUpdateSolrServer line 196.

On Sun, Oct 25, 2009 at 8:03 AM, Dadasheva, Olga olga_dadash...@harvard.edu
 wrote:

 I am using java 1.6.0_05

 To illustrate what is happening I wrote this test program that has 10
 threads adding a collection of documents and one thread optimizing the index
 every 10 sec.

 I am seeing that after the first optimize there is only one thread that
 keeps adding documents. The other ones are locked.

 In the real code I ended up adding synchronized around add on optimize to
 avoid this.

 public static void main(String[] args) {

final JettySolrRunner jetty = new JettySolrRunner(/solr, 8983 );
try {
jetty.start();
// setup the server...
String url = http://localhost:8983/solr;;
final StreamingUpdateSolrServer server = new
 StreamingUpdateSolrServer( url, 2, 5 ) {
 @Override
public void handleError(Throwable ex) {
 // do somethign...
}
};
server.setConnectionTimeout(1000);
server.setDefaultMaxConnectionsPerHost(100);
server.setMaxTotalConnections(100);
int i = 0;
while (i++  10) {
new Thread(add-thread+i) {
public void run(){
int j = 0;
while (true) {
try {
ListSolrInputDocument docs
 = new ArrayListSolrInputDocument();
for (int n = 0; n  50; n++)
 {
SolrInputDocument doc =
 new SolrInputDocument();
String docID =
 this.getName()+_doc_+j++;
doc.addField( id,
 docID);
doc.addField( content,
 document_+docID);
docs.add(doc);
}
server.add(docs);

  System.out.println(this.getName()+ added +docs.size()+ documents);
Thread.sleep(100);
} catch (Exception e) {
e.printStackTrace();

  System.err.println(this.getName()+ +e.getLocalizedMessage());
System.exit(0);
}
}
}
}.start();
}

new Thread(optimizer-thread) {
public void run(){
while (true) {
try {
Thread.sleep(1);
server.optimize();
System.out.println(this.getName()+
 optimized);
} catch (Exception e) {
e.printStackTrace();
System.err.println(optimizer
 +e.getLocalizedMessage());
System.exit(0);
}
}
}
}.start();


} catch (Exception e) {
e.printStackTrace();
 }

 }
 -Original Message-
 From: Lance Norskog [mailto:goks...@gmail.com]
 Sent: Tuesday, October 13, 2009 8:59 PM
 To: solr-user@lucene.apache.org
 Subject: Re: StreamingUpdateSolrServer - indexing process stops in a couple
 of hours

 Which Java release is this?  There are known thread-blocking problems in
 Java 1.5.

 Also, what sockets are used during this time? Try 'netstat -s | fgrep 8983'
 (or your Solr URL port #) and watch the active, TIME_WAIT, CLOSE_WAIT
 sockets build up. This may give a hint.

 On Tue, Oct 13, 2009 at 8:47 AM, Dadasheva, Olga 
 olga_dadash...@harvard.edu wrote:
  Hi,
 
  I am indexing documents using StreamingUpdateSolrServer. My 'setup'
  code is almost a copy of the junit test of the Solr trunk.
 
 try {
 StreamingUpdateSolrServer streamingServer = new
  StreamingUpdateSolrServer( url, 2, 5 ) {
 @Override
 public void handleError(Throwable ex) {
 System.out.println( new
  StreamingUpdateSolrServer error +ex);

Lock problems: Lock obtain timed out

2009-11-02 Thread Jérôme Etévé

Hi,

  I've got a few machines who post documents concurrently to a solr
instance. They do not issue the commit themselves, instead, I've got
autocommit set up at solr server side:
   autoCommit
  maxDocs5/maxDocs !--  commit at least every 5 docs --
  maxTime6/maxTime !-- Stays max 60s without commit --
/autoCommit

This usually works fine, but sometime the server goes in a deadlock
state . Here's the errors I get from the log (these go on forever
until I delete the index and restart all from zero):

02-Nov-2009 10:35:27 org.apache.solr.update.SolrIndexWriter finalize
SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates
a bug -- POSSIBLE RESOURCE LEAK!!!
...
[ multiple messages like this ]
...
02-Nov-2009 10:35:27 org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: 
NativeFSLock@/home/solrdata/jobs/index/lucene-703db99881e56205cb910a2e5fd816d3-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1538)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1395)
at 
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:190)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)


I'm wondering what could be the reason for this (if a commit takes
mire than 60 seconds for instance?), and if I should use better
locking or autocommittting options?

Here's the locking conf I've got at the moment:
   writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
   lockTypenative/lockType

I'm using solr trunk from 12 oct 2009 within tomcat.

Thanks for any help.

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net

Re: Spell check suggestion and correct way of implementation and some Questions

2009-11-02 Thread Shalin Shekhar Mangar

On Wed, Oct 28, 2009 at 8:57 PM, darniz rnizamud...@edmunds.com wrote:


 Question. Should i build the dictionlary only once and after that as new
 words are indexed the dictionary will be updated. Or i to do that manually
 over certain interval.


No. The dictionary is built only when spellcheck.build=true is specified as
a request parameter. You will need to explicitly send spellcheck.build=true
again when the document changes or you can use the buildOnCommit or
buildOnOptimize parameters to re-build the index automatically.

http://wiki.apache.org/solr/SpellCheckComponent#Building_on_Commits



 add the spellcheck component to the handler in my case as of now standard
 requets handler. I might also start adding some more dismax handlers
 depending on my requirement
  requestHandler name=standard class=solr.SearchHandler default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   !--
   int name=rows10/int
   str name=fl*/str
   str name=version2.1/str
--
 /lst
 arr name=last-components
strspellcheck/str
 /arr
  /requestHandler

 run the query with parameter spell.check=true, and also specify against
 which dictionary you want to run spell check again in my case my
 spellcheck.dictionary parameter is mySpellChecker.


The parameter is spellcheck=true not spell.check=true. If you do not give a
name to your dictionary then you do not need to add the
spellcheck.dictionary parameter.

-- 
Regards,
Shalin Shekhar Mangar.

tracking solr response time

2009-11-02 Thread bharath venkatesh

Hi,

We are using solr for many of ur products  it is doing quite well
.  But since no of hits are becoming high we are experiencing latency
in certain requests ,about 15% of our requests are suffering a latency
 . We are trying to identify  the problem .  It may be due to  network
issue or solr server is taking time to process the request  .   other
than  qtime which is returned along with the response is there any
other way to track solr servers performance ?  how is qtime calculated
, is it the total time from when solr server got the request till it
gave the response ? can we do some extra logging to track solr servers
performance . ideally I would want to pass some log id along with the
request (query ) to  solr server  and solr server must log the
response time along with that log id .

Thanks in advance ..
Bharath

Re: Problems downloading lucene 2.9.1

2009-11-02 Thread Grant Ingersoll



On Nov 2, 2009, at 12:12 AM, Licinio Fernández Maurelo wrote:


Hi folks,

as we are using an snapshot dependecy to solr1.4, today we are getting
problems when maven try to download lucene 2.9.1 (there isn't a any  
2.9.1

there).

Which repository can i use to download it?


They won't be there until 2.9.1 is officially released.  We are trying  
to speed up the Solr release by piggybacking on the Lucene release,  
but this little bit is the one downside.



-Grant

NullPointerException with TermVectorComponent

2009-11-02 Thread Andrew Clegg


Hi,

I've recently added the TermVectorComponent as a separate handler, following
the example in the supplied config file, i.e.:

  searchComponent name=tvComponent
class=org.apache.solr.handler.component.TermVectorComponent/

  requestHandler name=/tvrh
class=org.apache.solr.handler.component.SearchHandler
  lst name=defaults
  bool name=tvtrue/bool
  /lst
  arr name=last-components
  strtvComponent/str
  /arr
  /requestHandler

It works, but with one quirk. When you use tf.all=true, you get the tf*idf
scores in the output, just fine (along with tf and df). But if you use
tv.tf_idf=true you get an NPE:

http://server:8080/solr/tvrh/?q=1cukversion=2.2indent=ontv.tf_idf=true

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.solr.handler.component.TermVectorComponent$TVMapper.getDocFreq(TermVectorComponent.java:253)
at
org.apache.solr.handler.component.TermVectorComponent$TVMapper.map(TermVectorComponent.java:245)
at
org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.java:522)
at
org.apache.lucene.index.TermVectorsReader.readTermVectors(TermVectorsReader.java:401)
at org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:378)
at
org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:1253)
at
org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java:474)
at
org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:244)
at
org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:125)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
(etc.)

Is this a bug, or am I doing it wrong?

Cheers,

Andrew.

-- 
View this message in context: 
http://old.nabble.com/NullPointerException-with-TermVectorComponent-tp26156903p26156903.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: tracking solr response time

2009-11-02 Thread Yonik Seeley

On Mon, Nov 2, 2009 at 8:13 AM, bharath venkatesh
bharathv6.proj...@gmail.com wrote:
    We are using solr for many of ur products  it is doing quite well
 .  But since no of hits are becoming high we are experiencing latency
 in certain requests ,about 15% of our requests are suffering a latency

How much of a latency compared to normal, and what version of Solr are
you using?

  . We are trying to identify  the problem .  It may be due to  network
 issue or solr server is taking time to process the request  .   other
 than  qtime which is returned along with the response is there any
 other way to track solr servers performance ?
 how is qtime calculated
 , is it the total time from when solr server got the request till it
 gave the response ?

QTime is the time spent in generating the in-memory representation for
the response before the response writer starts streaming it back in
whatever format was requested.  The stored fields of returned
documents are also loaded at this point (to enable handling of huge
response lists w/o storing all in memory).

There are normally servlet container logs that can be configured to
spit out the real total request time.

 can we do some extra logging to track solr servers
 performance . ideally I would want to pass some log id along with the
 request (query ) to  solr server  and solr server must log the
 response time along with that log id .

Yep - Solr isn't bothered by params it doesn't know about, so just put
logid=xxx and it should also be logged with the other request
params.

-Yonik
http://www.lucidimagination.com

Re: tracking solr response time

2009-11-02 Thread Israel Ekpo

On Mon, Nov 2, 2009 at 8:41 AM, Yonik Seeley yo...@lucidimagination.comwrote:

On Mon, Nov 2, 2009 at 8:13 AM, bharath venkatesh
bharathv6.proj...@gmail.com wrote:
We are using solr for many of ur products it is doing quite well
. But since no of hits are becoming high we are experiencing latency
in certain requests ,about 15% of our requests are suffering a latency

How much of a latency compared to normal, and what version of Solr are
you using?

. We are trying to identify the problem . It may be due to network
issue or solr server is taking time to process the request . other
than qtime which is returned along with the response is there any
other way to track solr servers performance ?
how is qtime calculated
, is it the total time from when solr server got the request till it
gave the response ?

QTime is the time spent in generating the in-memory representation for
the response before the response writer starts streaming it back in
whatever format was requested. The stored fields of returned
documents are also loaded at this point (to enable handling of huge
response lists w/o storing all in memory).

There are normally servlet container logs that can be configured to
spit out the real total request time.

can we do some extra logging to track solr servers
performance . ideally I would want to pass some log id along with the
request (query ) to solr server and solr server must log the
response time along with that log id .

Yep - Solr isn't bothered by params it doesn't know about, so just put
logid=xxx and it should also be logged with the other request
params.

-Yonik
http://www.lucidimagination.com

If you are not using Java then you may have to track the elapsed time
manually.

If you are using the SolrJ Java client you may have the following options:

There is a method called getElapsedTime() in
org.apache.solr.client.solrj.response.SolrResponseBase which is available to
all the subclasses

I have not used it personally but I think this should return the time spent
on the client side for that request.

The QTime is not the time on the client side but the time spent internally
at the Solr server to process the request.

http://lucene.apache.org/solr//api/solrj/org/apache/solr/client/solrj/response/SolrResponseBase.html

http://lucene.apache.org/solr//api/solrj/org/apache/solr/client/solrj/response/QueryResponse.html

Most likely it could be as a result of an internal network issue between the
two servers or the Solr server is competing with other applications for
resources.

What operating system is the Solr server running on? Is you client
application connection to a Solr server on the same network or over the
internet? Are there other applications like database servers etc running on
the same machine? If so, then the DB server (or any other application) and
the Solr server could be competing for resources like CPU, memory etc.

If you are using Tomcat, you can take a look in
$CATALINA_HOME/logs/catalina.out, there are timestamps there that can also
guide you.

--
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.

Re: tracking solr response time

2009-11-02 Thread Grant Ingersoll



On Nov 2, 2009, at 5:41 AM, Yonik Seeley wrote:


QTime is the time spent in generating the in-memory representation for
the response before the response writer starts streaming it back in
whatever format was requested.  The stored fields of returned
documents are also loaded at this point (to enable handling of huge
response lists w/o storing all in memory).

There are normally servlet container logs that can be configured to
spit out the real total request time.


It might be nice to add a flag to DebugComponent to spit out timings  
only.  Thus, one could skip the explains, etc. and just see the  
timings.  Seems like that would have pretty low overhead and still see  
the timings.Î

Re: NullPointerException with TermVectorComponent

2009-11-02 Thread david.stu...@progressivealliance.co.uk

I think it might be to do with the library itself

I downloaded semanticvectors-1.22 and compiled from source. Then created a demo
corpus using 
java org.apache.lucene.demo.IndexFiles against the lucene src directory
I then ran a java pitt.search.semanticvectors.BuildIndex against the index and
got the following

Seedlength = 10
Dimension = 200
Minimum frequency = 0
Number non-alphabet characters = 0
Contents fields are: [contents]
Creating semantic term vectors ...
Populating basic sparse doc vector store, number of vectors: 774
Creating store of sparse vectors  ...
Created 774 sparse random vectors.
Creating term vectors ...
There are 36881 terms (and 774 docs)
0 ... 1000 ... 2000 ... 3000 ... 4000 ... Exception in thread main
java.lang.NullPointerException
    at
org.apache.lucene.index.DirectoryReader$MultiTermDocs.freq(DirectoryReader.java:
1068)
    at
pitt.search.semanticvectors.LuceneUtils.getGlobalTermFreq(LuceneUtils.java:70)
    at
pitt.search.semanticvectors.LuceneUtils.termFilter(LuceneUtils.java:187)
    at
pitt.search.semanticvectors.TermVectorsFromLucene.init(TermVectorsFromLucene.j
ava:163)
    at pitt.search.semanticvectors.BuildIndex.main(BuildIndex.java:138)
I am still digging but when you look at the source code it references lucene
call dating back to lucene 2.4 alot fo which are deprecated might need some
refreshing.

Cheers,

Dave

 
On 02 November 2009 at 14:40 Andrew Clegg andrew.cl...@gmail.com wrote:

 
 Hi,
 
 I've recently added the TermVectorComponent as a separate handler, following
 the example in the supplied config file, i.e.:
 
   searchComponent name=tvComponent
 class=org.apache.solr.handler.component.TermVectorComponent/
 
   requestHandler name=/tvrh
 class=org.apache.solr.handler.component.SearchHandler
           lst name=defaults
                   bool name=tvtrue/bool
           /lst
           arr name=last-components
                   strtvComponent/str
           /arr
   /requestHandler
 
 It works, but with one quirk. When you use tf.all=true, you get the tf*idf
 scores in the output, just fine (along with tf and df). But if you use
 tv.tf_idf=true you get an NPE:
 
 http://server:8080/solr/tvrh/?q=1cukversion=2.2indent=ontv.tf_idf=true
 
 HTTP Status 500 - null java.lang.NullPointerException at
 org.apache.solr.handler.component.TermVectorComponent$TVMapper.getDocFreq(Term
 VectorComponent.java:253)
 at
 org.apache.solr.handler.component.TermVectorComponent$TVMapper.map(TermVectorC
 omponent.java:245)
 at
 org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.jav
 a:522)
 at
 org.apache.lucene.index.TermVectorsReader.readTermVectors(TermVectorsReader.ja
 va:401)
 at org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:378)
 at
 org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:125
 3)
 at
 org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java
 :474)
 at
 org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java:
 244)
 at
 org.apache.solr.handler.component.TermVectorComponent.process(TermVectorCompon
 ent.java:125)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandle
 r.java:195)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.ja
 va:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338
 )
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:24
 1)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFi
 lterChain.java:235)
 at 
 (etc.)
 
 Is this a bug, or am I doing it wrong?
 
 Cheers,
 
 Andrew.
 
 -- 
 View this message in context:
 http://old.nabble.com/NullPointerException-with-TermVectorComponent-tp26156903p26156903.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problems downloading lucene 2.9.1

2009-11-02 Thread Ryan McKinley



On Nov 2, 2009, at 8:29 AM, Grant Ingersoll wrote:



On Nov 2, 2009, at 12:12 AM, Licinio Fernández Maurelo wrote:


Hi folks,

as we are using an snapshot dependecy to solr1.4, today we are  
getting
problems when maven try to download lucene 2.9.1 (there isn't a any  
2.9.1

there).

Which repository can i use to download it?


They won't be there until 2.9.1 is officially released.  We are  
trying to speed up the Solr release by piggybacking on the Lucene  
release, but this little bit is the one downside.


Until then, you can add a repo to:

http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1/maven/

Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-02 Thread Alexey Serba

Hi Eugene,

 - ability to iterate over all documents, returned in search, as Lucene does
  provide within a HitCollector instance. We would need to extract and
  aggregate various fields, stored in index, to group results and aggregate 
 them
  in some way.
 
 Also I did not find any way in the tutorial to access the search results with
 all fields to be processed by our application.

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr
Check out Faceted Search, probably you can achieve your goal by using
Facet Component

There's also Field Collapsing patch
http://wiki.apache.org/solr/FieldCollapsing


Alex

RE: Solr YUI autocomplete

2009-11-02 Thread Ankit Bhatnagar



Hey Amit,

My index(ie Solr) was on different domain, so I can't use XHR(as XHR doesnot 
work with cross domain proxyless data fetch).

I tried using YUI's  DS_ScriptNode but didn't work.

I completed my task by using jQuery and it worked well with solr.

-Ankit

-Original Message-
From: Amit Nithian [mailto:anith...@gmail.com] 
Sent: Monday, November 02, 2009 1:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr YUI autocomplete

I've used the YUI auto complete (albeit not with Solr which shouldn't matter
here) and it should work with JSON. I did one that simply made XHR calls
over to a method on my server which returned pipe delimited text which
worked fine.

Are you using the XHR Data source and if so, what type are you telling it to
expect. One of the examples on the YUI site is text based and i'm sure you
can specify TYPE_JSON or JS_ARRAY too.

- Amit

On Fri, Oct 30, 2009 at 7:04 AM, Ankit Bhatnagar abhatna...@vantage.comwrote:


 Does Solr supports JSONP (JSON with Padding) in the response?

 -Ankit



 -Original Message-
 From: Ankit Bhatnagar [mailto:abhatna...@vantage.com]
 Sent: Friday, October 30, 2009 10:27 AM
 To: 'solr-user@lucene.apache.org'
 Subject: Solr YUI autocomplete

 Hi Guys,

 I have question regarding - how to specify the

 I am using YUI autocomplete widget and it expects the JSONP response.


 http://localhost:8983/solr/select/?q=monitorversion=2.2start=0rows=10indent=onwt=jsonjson.wrf=

 I am not sure how should I specify the json.wrf=function

 Thanks
 Ankit

question about collapse.type = adjacent

2009-11-02 Thread michael8


Hi,

I would like to confirm if 'adjacent' in collapse.type means the documents
(with the same collapse field value) are considered adjacent *after* the
'sort' param from the query has been applied, or *before*?  I would think it
would be *after* since collapse feature primarily is meant for presentation
use.

Thanks,
Michael
-- 
View this message in context: 
http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26157114.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: tracking solr response time

2009-11-02 Thread bharath venkatesh

Thanks for the quick response
@yonik

How much of a latency compared to normal, and what version of Solr are
you using?

latency is usually around 2-4 secs (some times it goes more than that
) which happens to only 15-20% of the request other 80-85% of
request are very fast it is in milli secs ( around 200,000 requests
happens every day )

@Israel we are not using java client .. we r using python at the
client with response formatted in json

@yonikn @Israel does qtime measure the total time taken at the solr
server ? I am already measuring the time to get the response at
client end . I would want a means to know how much time the solr
server is taking to respond (process ) once it gets the request . so
that I could identify whether it is a solr server issue or internal
network issue

@Israel we are using rhel server 5 on both client and server .. we
have 6 solr sever . one is acting as master . both client and solr
sever are on the same network . those servers are dedicated solr
server except 2 severs which have DB and memcahce running .. we have
adjusted the load accordingly

On 11/2/09, Israel Ekpo israele...@gmail.com wrote:
On Mon, Nov 2, 2009 at 8:41 AM, Yonik Seeley
yo...@lucidimagination.comwrote: