date:20091111

Re: memory size

2009-11-11 Thread David Stuart


Hi
This is a php problem you need to increase your per thread memory  
limit in your php.ini the field name is memory_limit


Regards

David

On 11 Nov 2009, at 07:56, Jörg Agatz joerg.ag...@googlemail.com  
wrote:



Hallo,

I have a Problem withe the Memory Size, but i dont know how i can  
repair it.


Maby it is a PHP problem, but i dont know.

My Error:

Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to
allocate 16515072 bytes)


I hope you can help me

KinGArtus

Re: memory size

2009-11-11 Thread Ritesh Gurung

Depends on number of rows being fetched from Solr + php configuration +
solr writer you are using json || xml etc.

Rgds,
Ritesh Gurung
David Stuart wrote:
 Hi
 This is a php problem you need to increase your per thread memory
 limit in your php.ini the field name is memory_limit

 Regards

 David

 On 11 Nov 2009, at 07:56, Jörg Agatz joerg.ag...@googlemail.com wrote:

 Hallo,

 I have a Problem withe the Memory Size, but i dont know how i can
 repair it.

 Maby it is a PHP problem, but i dont know.

 My Error:

 Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to
 allocate 16515072 bytes)


 I hope you can help me

 KinGArtus

How to get WildCard/prefix in SolrSharp

2009-11-11 Thread theashik


In Solrj, there is a method called setAllowLeadingWildcard(true). I need to
call the same method in SolrSharp API as well. But I don't find the class
SolrQueryParser.cs in SolrSharp API. Can any one suggest me how to call
that method, if I can use any provided namespace as
org.apache.solr.SolrSharp.Search.SolrQueryParser in SolrSharp. 


Thank you
Ashik Rajbhandari
-- 
View this message in context: 
http://old.nabble.com/How-to-get-WildCard-prefix-in-SolrSharp-tp26300435p26300435.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: memory size

2009-11-11 Thread Jörg Agatz

I have change the php.ini, now it works...
it was a Problem in PHP, because i grouping the Results in PHP, when i have
much results i need more memory

Thanks for the Help

Re: How to get WildCard/prefix in SolrSharp

2009-11-11 Thread Mauricio Scheffer

AFAIK this needs to be set in the config in your case, which is still an
open issue: http://issues.apache.org/jira/browse/SOLR-218

On Wed, Nov 11, 2009 at 9:25 AM, theashik theas...@hotmail.com wrote:


 In Solrj, there is a method called setAllowLeadingWildcard(true). I need to
 call the same method in SolrSharp API as well. But I don't find the class
 SolrQueryParser.cs in SolrSharp API. Can any one suggest me how to call
 that method, if I can use any provided namespace as
 org.apache.solr.SolrSharp.Search.SolrQueryParser in SolrSharp.


 Thank you
 Ashik Rajbhandari
 --
 View this message in context:
 http://old.nabble.com/How-to-get-WildCard-prefix-in-SolrSharp-tp26300435p26300435.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Commit error

2009-11-11 Thread Licinio Fernández Maurelo

Hi folks,

i'm getting this error while committing after a dataimport of only 12 docs
!!!

Exception while solr commit.
java.io.IOException: background merge hit exception: _3kta:C2329239
_3ktb:c11-_3ktb into _3ktc [optimize] [mergeDocStores]
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2829)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2750)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:138)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:66)
at org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:170)
at org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:208)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.io.IOException: No hay espacio libre en el dispositivo
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:499)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:191)
at
org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
at
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
at
org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:75)
at org.apache.lucene.store.IndexOutput.writeBytes(IndexOutput.java:45)
at
org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:229)
at
org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:184)
at
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:217)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5089)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4589)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)

Index info: 2.600.000 docs | 11G size
System info: 15GB free disk space

When attempting to commit the disk usage increases until solr breaks ... it
looks like 15 GB is not enought space to do the merge | optimize

Any advice?

-- 
Lici

Re: Commit error

2009-11-11 Thread Israel Ekpo

2009/11/11 Licinio Fernández Maurelo licinio.fernan...@gmail.com

 Hi folks,

 i'm getting this error while committing after a dataimport of only 12 docs
 !!!

 Exception while solr commit.
 java.io.IOException: background merge hit exception: _3kta:C2329239
 _3ktb:c11-_3ktb into _3ktc [optimize] [mergeDocStores]
 at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2829)
 at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2750)
 at

 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401)
 at

 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
 at

 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:138)
 at

 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:66)
 at
 org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:170)
 at
 org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:208)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
 at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
 at

 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
 at

 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
 Caused by: java.io.IOException: No hay espacio libre en el dispositivo
 at java.io.RandomAccessFile.writeBytes(Native Method)
 at java.io.RandomAccessFile.write(RandomAccessFile.java:499)
 at

 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:191)
 at

 org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
 at

 org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
 at

 org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:75)
 at org.apache.lucene.store.IndexOutput.writeBytes(IndexOutput.java:45)
 at

 org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:229)
 at

 org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:184)
 at

 org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:217)
 at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5089)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4589)
 at

 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
 at

 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)

 Index info: 2.600.000 docs | 11G size
 System info: 15GB free disk space

 When attempting to commit the disk usage increases until solr breaks ... it
 looks like 15 GB is not enought space to do the merge | optimize

 Any advice?

 --
 Lici



Hi Licinio,

During the the optimization process, the index size would be approximately
double what it was originally and the remaining space on disk may not be
enough for the task.

You are describing exactly what could be going on
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.

Re: Commit error

2009-11-11 Thread Licinio Fernández Maurelo

Thanks Israel, i've done a sucesfull import using optimize=false

2009/11/11 Israel Ekpo israele...@gmail.com

 2009/11/11 Licinio Fernández Maurelo licinio.fernan...@gmail.com

  Hi folks,
 
  i'm getting this error while committing after a dataimport of only 12
 docs
  !!!
 
  Exception while solr commit.
  java.io.IOException: background merge hit exception: _3kta:C2329239
  _3ktb:c11-_3ktb into _3ktc [optimize] [mergeDocStores]
  at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2829)
  at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2750)
  at
 
 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401)
  at
 
 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
  at
 
 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:138)
  at
 
 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:66)
  at
  org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:170)
  at
  org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:208)
  at
 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
  at
 
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
  at
 
 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
  at
 
 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
  Caused by: java.io.IOException: No hay espacio libre en el dispositivo
  at java.io.RandomAccessFile.writeBytes(Native Method)
  at java.io.RandomAccessFile.write(RandomAccessFile.java:499)
  at
 
 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:191)
  at
 
 
 org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
  at
 
 
 org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
  at
 
 
 org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:75)
  at org.apache.lucene.store.IndexOutput.writeBytes(IndexOutput.java:45)
  at
 
 
 org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:229)
  at
 
 
 org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:184)
  at
 
 
 org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:217)
  at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5089)
  at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4589)
  at
 
 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
  at
 
 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
 
  Index info: 2.600.000 docs | 11G size
  System info: 15GB free disk space
 
  When attempting to commit the disk usage increases until solr breaks ...
 it
  looks like 15 GB is not enought space to do the merge | optimize
 
  Any advice?
 
  --
  Lici
 


 Hi Licinio,

 During the the optimization process, the index size would be approximately
 double what it was originally and the remaining space on disk may not be
 enough for the task.

 You are describing exactly what could be going on
 --
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.




-- 
Lici

Re: deployment questions

2009-11-11 Thread Joel Nylund


Anyone?

I have done more reading and testing and it seems like I want to:

Use SolrJ and embed solr in my webapp, but I want to disable the http  
access to solr, meaning force all calls through my solrj interface I  
am building (no admin access etc).


Is there a simple way to do this?

Am I better off running solr as a server on its own and using network  
security?


thanks
Joel

On Nov 9, 2009, at 5:04 PM, Joel Nylund wrote:


Hi,

I have a java app that is deployed in jboss/tomcat container. I  
would like to add my solr index to it. I have read about this and it  
seems fairly straight forward, but im curious the best way to secure  
it.


I require my users to login to my app to use it, so I want the  
search functions to behave the same way. Ideally I would like to do  
the solr queries from the client using ajax/json calls.


So given this my thinking was I should wrapper the solr servlet and  
do a local proxy type interface to ensure security. Is there any  
easier way to do this, or an example of a good way to do this? Or  
does the solr servlet support a interceptor type pattern where I  
can have it call a piece of code before I execute the call (this  
application is old and not using std j2ee security so I dont think I  
can use that.)



Another option is to do solrj on the server, and not do the client  
side calls, in this case I think I could lock down the solr servlet  
interface to only allow local calls.


thanks
Joel

${dataimporter.delta.twitter_id} not getting populated in deltaImportQuery

2009-11-11 Thread Mark Ellul

Hi,

I have a interesting issue...

Basically I am trying to delta imports on solr 1.4 on a postgresql 8.3
database.

Basically when I am running a delta import with the entity below I get an
exception  (see below the entity definition) showing the query its trying to
run and you can see that its not populating the where clause of my
dataImportQuery.

I have tried ${dataimporter.delta.twitter_id} and ${dataimporter.delta.id}
and get the same exceptions.

Am I missing something obvious?

Any help would be appreciated!

Regards

Mark


entity name=Tweeter  pk=twitter_id
 query=
select twitter_id,
twitter_id as pk,
 1 as site_id,
screen_name

from api_tweeter WHERE
 tweet_mapreduce_on IS NOT NULL;
 transformer=TemplateTransformer

deltaImportQuery=
select twitter_id,
 twitter_id as pk,
1 as site_id,
screen_name

from api_tweeter
where twitter_id=${dataimporter.delta.twitter_id };

deltaQuery =select twitter_id from api_tweeter where  modified_on 
'${dataimporter.last_index_time}' and tweet_mapreduce_on IS NOT NULL;


field name=twitter_id column=twitter_id /


/entity


INFO: Completed parentDeltaQuery for Entity: Tweeter
Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: Tweeter document : SolrInputDocument[{}]
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query:  select twitter_id,twitter_id
as pk,1 as site_id,   screen_name   from api_tweeter where
twitter_id=;Processing Document # 1
 at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
 at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
 at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
 at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172)
 at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
 at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at end of
input
  Position: 1197
at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062)
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795)
 at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479)
 at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:353)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:345)
 at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246)
... 11 more
Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
SEVERE: Delta Import Failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query:  select twitter_id,twitter_id
as pk,1 as site_id,   screen_name  from api_tweeter where
twitter_id=;Processing Document # 1
 at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
 at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
 at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
 at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172)
 at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
 at

Re: synonym payload boosting

2009-11-11 Thread David Ginzburg

Hi,
I have added a PayloadTermQueryPlugin after reading
https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

my class is :
 */
*import org.apache.solr.common.params.SolrParams;*
*import org.apache.solr.common.util.NamedList;*
*import org.apache.solr.common.SolrException;*
*import org.apache.solr.request.SolrQueryRequest;*
*import org.apache.lucene.search.Query;*
*import org.apache.lucene.search.payloads.*;*
*import org.apache.lucene.queryParser.ParseException;*
*import org.apache.lucene.index.Term;*
*import org.apache.solr.search.QParser;*
*import org.apache.solr.search.QParserPlugin;*
*import org.apache.solr.search.QueryParsing;*
*
*
*public class PayloadTermQueryPlugin extends QParserPlugin {*
*private MinPayloadFunction payloadFunc;*
*@Override*
*  public void init(NamedList args) {*
*  this.payloadFunc=new MinPayloadFunction();*
*  }*
*
*
*  @Override*
*  public QParser createParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) {*
*return new QParser(qstr, localParams, params, req) {*
*@Override*
*  public Query parse() throws ParseException {*
*
*
* Term term = new Term(localParams.get(QueryParsing.F),
localParams.get(QueryParsing.V));*
*  return new PayloadTermQuery(term,payloadFunc, false);*
*  }*
*};*
*  }*


I tested it using Solrj

* @Override*
*protected void setUp() throws Exception {*
*super.setUp();*
*System.setProperty(solr.solr.home, C:\\temp\\solr_home1.4);*
*CoreContainer.Initializer initializer = new
CoreContainer.Initializer();*
*
*
*try {*
*coreContainer = initializer.initialize();*
*} catch (IOException ex) {*
*
 Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
null, ex);*
*} catch (ParserConfigurationException ex) {*
*
 Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
null, ex);*
*} catch (SAXException ex) {*
*
 Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
null, ex);*
*}*
*server = new EmbeddedSolrServer(coreContainer, );*
*}*
**
*public void testSeacrhAndBoost() {*
*SolrQuery query = new SolrQuery();*
*query.setQuery(PFirstName:steve);*
*query.setParam(hl.fl, PFirstName);*
* query.setParam(defType, payload);*
*query.setIncludeScore(true);*
*
*
*query.setRows(10);*
*query.setFacet(false);*
*
*
*try {*
*QueryResponse qr = server.query(query);*
**
*ListPersonDoc l = qr.getBeans(PersonDoc.class);*
*for (PersonDoc personDoc : l) {*
*System.out.println(personDoc);*
*}*
*
*
*} catch (SolrServerException ex) {*
*
 Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
null, ex);*
*
*
*}*
*}*
*}*


I get an NPE trying to access  localParams in the *public QParser
createParser(String qstr, SolrParams localParams, SolrParams params,
SolrQueryRequest req)*  method
The NPE is actually in the *public Query parse() throws ParseException*method

I could not find documentation about the parse method, How can I pass
the localParams?
What is the difference between the localParams and params?


I would be happy to write the a case study on the wiki but, I'm not sure
exactly what you mean- The resolution i will eventually come to or the
process of finding it?
I'm still trying to figure out what exactly to do.  I have purchased the
Solr 1.4 book , but it doesn't seem to have much information about my needs.

On Tue, Nov 10, 2009 at 10:09, David Ginzburg da...@digitaltrowel.comwrote:

 I would be happy to.
 I'm not sure exactly what you mean- The resolution i will eventually come
 to or the process of finding it?
 I'm still trying to figure out what exactly to do.  I have purchased the
 Solr 1.4 book , but it doesn't seem to have much information about my needs.


 -- Forwarded message --
 From: Lance Norskog goks...@gmail.com
 Date: Tue, Nov 10, 2009 at 04:11
 Subject: Re: synonym payload boosting
 To: solr-user@lucene.apache.org


 David, when you get this working would you consider writing a case
 study on the wiki? Nothing complex, just something that describes how
 you did several customizations to create a new feature.

 On Mon, Nov 9, 2009 at 4:10 AM, Grant Ingersoll gsing...@apache.org
 wrote:
 
  On Nov 9, 2009, at 4:41 AM, David Ginzburg wrote:
 
  I have found this
 
 
 https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  patch
  But i don't want to use any function, just the normal scoring and the
  similarity class  I have written.
  Can you point me to  modifications I need (if any) ?
 
 
 
  Amhet's point is that you need some query that will actually invoke the
  payload in scoring.  PayloadTermQuery and

Re: ${dataimporter.delta.twitter_id} not getting populated in deltaImportQuery

2009-11-11 Thread Mark Ellul

I have 2 entities from the root node, not sure if that makes a difference!

On Wed, Nov 11, 2009 at 4:49 PM, Mark Ellul m...@catalystic.com wrote:

 Hi,

 I have a interesting issue...

 Basically I am trying to delta imports on solr 1.4 on a postgresql 8.3
 database.

 Basically when I am running a delta import with the entity below I get an
 exception  (see below the entity definition) showing the query its trying to
 run and you can see that its not populating the where clause of my
 dataImportQuery.

 I have tried ${dataimporter.delta.twitter_id} and ${dataimporter.delta.id}
 and get the same exceptions.

 Am I missing something obvious?

 Any help would be appreciated!

 Regards

 Mark


 entity name=Tweeter  pk=twitter_id
  query=
 select twitter_id,
 twitter_id as pk,
  1 as site_id,
 screen_name

 from api_tweeter WHERE
  tweet_mapreduce_on IS NOT NULL;
  transformer=TemplateTransformer

 deltaImportQuery=
 select twitter_id,
  twitter_id as pk,
 1 as site_id,
 screen_name

 from api_tweeter
 where twitter_id=${dataimporter.delta.twitter_id };
 
 deltaQuery =select twitter_id from api_tweeter where  modified_on 
 '${dataimporter.last_index_time}' and tweet_mapreduce_on IS NOT NULL;

 
 field name=twitter_id column=twitter_id /


 /entity


 INFO: Completed parentDeltaQuery for Entity: Tweeter
 Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DocBuilder
 buildDocument
 SEVERE: Exception while processing: Tweeter document :
 SolrInputDocument[{}]
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query:  select twitter_id,twitter_id
 as pk,1 as site_id,   screen_name   from api_tweeter where
 twitter_id=;Processing Document # 1
  at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
  at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
  at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
 at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
  at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
  at
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172)
  at
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352)
 at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
  at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
 Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at end of
 input
   Position: 1197
 at
 org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062)
 at
 org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795)
  at
 org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
 at
 org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479)
  at
 org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:353)
 at
 org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:345)
  at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246)
 ... 11 more
 Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DataImporter
 doDeltaImport
 SEVERE: Delta Import Failed
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query:  select twitter_id,twitter_id
 as pk,1 as site_id,   screen_name  from api_tweeter where
 twitter_id=;Processing Document # 1
  at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
  at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
  at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
 at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
  at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
  at
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276)
 at

Re: Is optimize / optimized?

2009-11-11 Thread Otis Gospodnetic

Yes.  I believe the is the index already optimized is in the guts of Lucene.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: William Pierce evalsi...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, October 23, 2009 1:05:52 PM
 Subject: Is optimize / optimized?
 
 Folks:
 
 If I issue two requests with no intervening changes to the index,  
 will the second optimize request be smart enough to not do anything?
 
 Thanks,
 
 Bill

Re: NGram query failing

2009-11-11 Thread Otis Gospodnetic

That's actually easy to explain/understand.
If the min n-gram size is 3, a query term with just 2 characters will ever 
match any terms that originally had  2 characters because longer terms will 
never get tokenized into terms below 3-character tokens.

Take the term: house
house = hou ous use

If you search term is ho, it will never match the above, as there is no term 
ho in there.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Charlie Jackson charlie.jack...@cision.com
 To: solr-user@lucene.apache.org
 Sent: Fri, October 23, 2009 4:32:33 PM
 Subject: RE: NGram query failing
 
 
 Well, I fixed my own problem in the end. For the record, this is the
 schema I ended up going with:
 
 
 
 
 
 
 minGramSize=2 /
 
 
 
 
 
 minGramSize=2/
 
 
 
 I could have left it a trigram but went with a bigram because with this
 setup, I can get queries to properly hit as long as the min/max gram
 size is met. In other words, for any queries two or more characters
 long, this works for me. Less than two characters and it fails. 
 
 I don't know exactly why that is, but I'll take it anyway!
 
 - Charlie
 
 
 -Original Message-
 From: Charlie Jackson [mailto:charlie.jack...@cision.com] 
 Sent: Friday, October 23, 2009 10:00 AM
 To: solr-user@lucene.apache.org
 Subject: NGram query failing
 
 I have a requirement to be able to find hits within words in a free-form
 id field. The field can have any type of alphanumeric data - it's as
 likely it will be something like 123456 as it is to be SUN-123-ABC.
 I thought of using NGrams to accomplish the task, but I'm having a
 problem. I set up a field like this
 
 
 
 
 positionIncrementGap=100
 
 
 
 
 minGramSize=1 maxGramSize=3/
 
 
 
   
 
 
 
 
 
 After indexing a field like this, the analysis page indicates my queries
 should work. If I give it a sample field value of ABC-123456-SUN and a
 query value of 45 it shows hits in several places, which is what I
 expected.
 
 
 
 However, when I actually query the field with something like 45 I get
 no hits back. Looking at the debugQuery output, it looks like it's
 taking my analyzed query text and putting it into a phrase query. So,
 for a query of 45 it turns into a phrase query of :4 5 45
 which then doesn't hit on anything in my index.
 
 
 
 What am I missing to make this work?
 
 
 
 - Charlie

Re: Delete of non-existent record succeeds

2009-11-11 Thread Eric Kilby


Rather than start a new thread, I'd like to follow up on this.  I'm going to
oversimplify but the basic question should be straightforward.

I currently have one very large SOLR index, and 5 small ones which contain
filtered subsets out of the big one and are used for faceting in one area of
our site.  The means by which we determine documents to go into the smaller
ones is somewhat expensive computationally, and involves hitting a database
and a machine learning system among other things.

The problem I'm considering is that when a document goes inactive
(indicated by a status field) in the big index, I'd like to remove it from
any of the small ones that it happens to be in.  This may be any of the 5 or
none at all, as they don't nearly cover the whole space.  I don't need to
keep inactive documents in the small indexes, and prefer to keep them small
for performance purposes.

So rather than doing the expensive process to figure out what, if any, of
the small indexes to issue the delete against, would it be terribly
expensive to issue 5 deletes against the 5 servers (cores) and have them not
match?  What is the overhead on the SOLR side internally to process a
(non-)delete in this case?  I'm hoping the main overhead on this is
bandwidth to issue the requests, which is not a concern since the code will
be running on the same machine as the SOLR instances.

I appreciate any advice on this matter, and congrats on the release of 1.4!


Yonik Seeley wrote:
 
 delete means delete if it exists.
 
 Due to how lucene works, to get good performance deletes are actually
 buffered... when the method returns, the deletes haven't really been
 applied yet.
 

-- 
View this message in context: 
http://old.nabble.com/Delete-of-non-existent-record-succeeds-tp12060767p26304667.html
Sent from the Solr - User mailing list archive at Nabble.com.

XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2009-11-11 Thread aseem cheema

I am trying to post a document with the following content using SolrJ:
centercontent/center
I need the xml/html tags to be ignored. Even though this works fine in
analysis.jsp, this does not work with SolrJ, as the client escapes the
 and  with lt; and gt; and HTMLStripCharFilterFactory does not
strip those escaped tags. How can I achieve this? Any ideas will be
highly appreciated.

There is escapedTags in HTMLStripCharFilterFactory constructor. Is
there a way to get that to work?
Thanks
-- 
Aseem

Re: deployment questions

2009-11-11 Thread Walter Underwood

Either way works, but running Solr as a server means that you have an  
admin interface. That can be very useful. You will want it as soon as  
someone asks why some document is not the first hit for their favorite  
query.


wunder

On Nov 11, 2009, at 7:26 AM, Joel Nylund wrote:


Anyone?

I have done more reading and testing and it seems like I want to:

Use SolrJ and embed solr in my webapp, but I want to disable the  
http access to solr, meaning force all calls through my solrj  
interface I am building (no admin access etc).


Is there a simple way to do this?

Am I better off running solr as a server on its own and using  
network security?


thanks
Joel

On Nov 9, 2009, at 5:04 PM, Joel Nylund wrote:


Hi,

I have a java app that is deployed in jboss/tomcat container. I  
would like to add my solr index to it. I have read about this and  
it seems fairly straight forward, but im curious the best way to  
secure it.


I require my users to login to my app to use it, so I want the  
search functions to behave the same way. Ideally I would like to do  
the solr queries from the client using ajax/json calls.


So given this my thinking was I should wrapper the solr servlet and  
do a local proxy type interface to ensure security. Is there any  
easier way to do this, or an example of a good way to do this? Or  
does the solr servlet support a interceptor type pattern where I  
can have it call a piece of code before I execute the call (this  
application is old and not using std j2ee security so I dont think  
I can use that.)



Another option is to do solrj on the server, and not do the client  
side calls, in this case I think I could lock down the solr servlet  
interface to only allow local calls.


thanks
Joel

indexing on differnt server

2009-11-11 Thread Joel Nylund


is it possible to index on one server and copy the files over?

thanks
Joel

Re: indexing on differnt server

2009-11-11 Thread Rafał Kuć

Hello!

 is it possible to index on one server and copy the files over?

 thanks
 Joel


Yes,  it  is  possible,  look  at the CollectionDistribution wiki page
(http://wiki.apache.org/solr/CollectionDistribution).

-- 
Regards,
 Rafał Kuć

Re: Delete of non-existent record succeeds

2009-11-11 Thread Otis Gospodnetic

I'd go with just broadcasting the delete.  If I remember correctly, that's what 
we did at one place where we used vanilla Lucene with RMI (pre-Solr) and we 
didn't see any problems due to that (RMI, on the other hand).  Whether this 
will work for you depends on how often you'll need to do that, among other 
things.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Eric Kilby ki...@stylefeeder.com
 To: solr-user@lucene.apache.org
 Sent: Wed, November 11, 2009 11:55:54 AM
 Subject: Re: Delete of non-existent record succeeds
 
 
 Rather than start a new thread, I'd like to follow up on this.  I'm going to
 oversimplify but the basic question should be straightforward.
 
 I currently have one very large SOLR index, and 5 small ones which contain
 filtered subsets out of the big one and are used for faceting in one area of
 our site.  The means by which we determine documents to go into the smaller
 ones is somewhat expensive computationally, and involves hitting a database
 and a machine learning system among other things.
 
 The problem I'm considering is that when a document goes inactive
 (indicated by a status field) in the big index, I'd like to remove it from
 any of the small ones that it happens to be in.  This may be any of the 5 or
 none at all, as they don't nearly cover the whole space.  I don't need to
 keep inactive documents in the small indexes, and prefer to keep them small
 for performance purposes.
 
 So rather than doing the expensive process to figure out what, if any, of
 the small indexes to issue the delete against, would it be terribly
 expensive to issue 5 deletes against the 5 servers (cores) and have them not
 match?  What is the overhead on the SOLR side internally to process a
 (non-)delete in this case?  I'm hoping the main overhead on this is
 bandwidth to issue the requests, which is not a concern since the code will
 be running on the same machine as the SOLR instances.
 
 I appreciate any advice on this matter, and congrats on the release of 1.4!
 
 
 Yonik Seeley wrote:
  
  delete means delete if it exists.
  
  Due to how lucene works, to get good performance deletes are actually
  buffered... when the method returns, the deletes haven't really been
  applied yet.
  
 
 -- 
 View this message in context: 
 http://old.nabble.com/Delete-of-non-existent-record-succeeds-tp12060767p26304667.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: [DIH] blocking import operation

2009-11-11 Thread Sascha Szott

Noble,

Noble Paul wrote:
 DIH imports are really long running. There is a good chance that the
 connection times out or breaks in between.
Yes, you're right, I missed that point (in my case imports take no longer
than a minute).

 how about a callback?
Thanks for the hint. There was a discussion on adding a callback url to
DIH a month ago, but it seems that no issue was raised. So, up to now its
only possible to implement an appropriate Solr EventListener. Should we
open an issue for supporting callback urls?

Best,
Sascha


 On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott sz...@zib.de wrote:
 Hi all,

 currently, DIH's import operation(s) only works asynchronously.
 Therefore,
 after submitting an import request, DIH returns immediately, while the
 import process (in case a large amount of data needs to be indexed)
 continues asynchronously behind the scenes.

 So, what is the recommended way to check if the import process has
 already
 finished? Or still better, is there any method / workaround that will
 block
 the import operation's caller until the operation has finished?

 In my application, the DIH receives some URL parameters which are used
 for
 determining the database name that is used within data-config.xml, e.g.

 http://localhost:8983/solr/dataimport?command=full-importdbname=foo

 Since only one DIH, /dataimport, is defined, but several database needs
 to
 be indexed, it is required to issue this command several times, e.g.

 http://localhost:8983/solr/dataimport?command=full-importdbname=foo

 ... wait until /dataimport?command=status says Indexing completed (but
 without using a loop that checks it again and again) ...

 http://localhost:8983/solr/dataimport?command=full-importdbname=barclean=false


 A suitable solution, at least IMHO, would be to have an additional DIH
 parameter which determines whether the import call is blocking on
 non-blocking, the default. As far as I see, this could be accomplished
 since
 Solr can execute more than one import operation at a time (it starts a
 new
 thread for each). Perhaps, my question is somehow related to the
 discussion
 [1] on ParallelDataImportHandler.

 Best,
 Sascha

 [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee

weird problem with solr.DateField

2009-11-11 Thread siping liu


Hi,

I'm using Solr 1.4 (from nightly build about 2 months ago) and have this 
defined in solrconfig:

fieldType name=date class=solr.DateField sortMissingLast=true 
omitNorms=true /

field name=lastUpdate type=date indexed=true stored=true default=NOW 
multiValued=false /

 

and following code that get executed once every night:

CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(http://...;);
solrServer.setRequestWriter(new BinaryRequestWriter());

solrServer.add(documents);
solrServer.commit();

UpdateResponse deleteResult = solrServer.deleteByQuery(lastUpdate:[* TO 
NOW-2HOUR]);
solrServer.commit();

 

The purpose is to refresh index with latest data (in documents).

This works fine, except that after a few days I start to see a few documents 
with no lastUpdate field (query -lastUpdate:[* TO *]) -- how can that be 
possible?

 

thanks in advance.

 
  
_
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=9690331ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-11 Thread Peter Wolanin

It looks like the CJK one actually does 2-grams plus a little
processing separate processing on latin text.

That's kind of interesting - in general can I build a custom tokenizer
from existing tokenizers that treats different parts of the input
differently based on the utf-8 range of the characters? E.g. use a
porter stemmer for stretches of Latin text and n-gram or something
else for CJK?

-Peter

On Tue, Nov 10, 2009 at 9:21 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Yes, that's the n-gram one. I believe the existing CJK one in Lucene is
really just an n-gram tokenizer, so no different than the normal n-gram
tokenizer.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

- Original Message
From: Peter Wolanin peter.wola...@acquia.com
To: solr-user@lucene.apache.org
Sent: Tue, November 10, 2009 7:34:37 PM
Subject: Re: any docs on solr.EdgeNGramFilterFactory?

So, this is the normal N-gram one? NGramTokenizerFactory

Digging deeper - there are actualy CJK and Chinese tokenizers in the
Solr codebase:

http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html
http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html

The CJK one uses the lucene CJKTokenizer
http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html

and there seems to be another one even that no one has wrapped into Solr:
http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html

So seems like the existing options are a little better than I thought,
though it would be nice to have some docs on properly configuring
these.

-Peter

On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic
wrote:
Peter,

For CJK and n-grams, I think you don't want the *Edge* n-grams, but just
n-grams.
Before you take the n-gram route, you may want to look at the smart Chinese
analyzer in Lucene contrib (I think it works only for Simplified Chinese) and
Sen (on java.net). I also spotted a Korean analyzer in the wild a few months
back.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

- Original Message
From: Peter Wolanin
To: solr-user@lucene.apache.org
Sent: Tue, November 10, 2009 4:06:52 PM
Subject: any docs on solr.EdgeNGramFilterFactory?

This fairly recent blog post:

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

describes the use of the solr.EdgeNGramFilterFactory as the tokenizer
for the index. I don't see any mention of that tokenizer on the Solr
wiki - is it just waiting to be added, or is there any other
documentation in addition to the blog post? In particular, there was
a thread last year about using an N-gram tokenizer to enable
reasonable (if not ideal) searching of CJK text, so I'd be curious to
know how people are configuring their schema (with this tokenizer?)
for that use case.

Thanks,

Peter

--
Peter M. Wolanin, Ph.D.
Momentum Specialist, Acquia. Inc.
peter.wola...@acquia.com

[DIH] concurrent requests to DIH

2009-11-11 Thread Sascha Szott

Hi all,

I'm using the DIH in a parameterized way by passing request parameters
that are used inside of my data-config. All imports end up in the same
index.

1. Is it considered as good practice to set up several DIH request
handlers, one for each possible parameter value?

2. In case the range of parameter values is broad, it's not convenient to
define separate request handlers for each value. But this entails a
limitation (as far as I see): It is not possible to fire several request
to the same DIH handler (with different parameter values) at the same
time. However, in case several request handlers would be used (as in 1.),
concurrent requests (to the different handlers) are possible. So, how to
overcome this limitation?

Best,
Sascha

add XML/HTML documents using SolrJ, without bypassing HTML char filter

2009-11-11 Thread aseem cheema

Hey Guys,
How do I add HTML/XML documents using SolrJ such that it does not by
pass the HTML char filter?

SolrJ escapes the HTML/XML value of a field, and that make it bypass
the HTML char filter. For example centercontent/center if added to
a field with HTMLStripCharFilter on the field using SolrJ, is not
stripped of center tags. But if check in analysis.jsp, it does get
stripped. When I look at the SolrJ XML feed, it looks like this:
adddoc boost=1.0field name=idhttp://haha.com/fieldfield
name=textlt;centergt;contentlt;/centergt;/field/doc/add

Any help is highly appreciated. Thanks.

-- 
Aseem

Configuring Solr to use RAMDirectory

2009-11-11 Thread Thomas Nguyen

Is it possible to configure Solr to fully load indexes in memory?  I
wasn't able to find any documentation about this on either their site or
in the Solr 1.4 Enterprise Search Server book.

Re: complex queries

2009-11-11 Thread Vicky_Dev



Hi Erik,

Is it possible to get result of one solr query feed into another Solr Query?

Issue which I am facing right now is::
I am getting results from one query and I just need 2 index attribute values
. These index attribute values are used for form new Query to Solr.

Since Solr gives result only for GET request, hence there is restriction on
: forming query with all values.

Please do send your views on above problem

Thanks
~Vikrant 



Erik Hatcher wrote:
 
 
 On May 6, 2008, at 8:57 PM, Kevin Osborn wrote:
 I don't think this is possible, but I figure that I would ask.

 So, I want to find documents that match a search term and where a  
 field in those documents are also in the results of a subquery.  
 Basically, I am looking for the Solr equivalent of doing a SQL IN  
 clause.
 
 search clause AND field:(value1 OR value2 OR value3)
 
 does that do the trick for you?If not, could you elaborate with  
 an example?
 
   Erik
 
 
 

-- 
View this message in context: 
http://old.nabble.com/complex-queries-tp17095335p26312245.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-11 Thread Robert Muir

Peter, here is a project that does this:
http://issues.apache.org/jira/browse/LUCENE-1488

-Peter

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

- Original Message
From: Peter Wolanin peter.wola...@acquia.com
To: solr-user@lucene.apache.org
Sent: Tue, November 10, 2009 7:34:37 PM
Subject: Re: any docs on solr.EdgeNGramFilterFactory?

So, this is the normal N-gram one? NGramTokenizerFactory

Digging deeper - there are actualy CJK and Chinese tokenizers in the
Solr codebase:

http://lucene.apache.org/solr/api/org/apache/solr/analysis/CJKTokenizerFactory.html

http://lucene.apache.org/solr/api/org/apache/solr/analysis/ChineseTokenizerFactory.html

The CJK one uses the lucene CJKTokenizer

http://lucene.apache.org/java/2_9_1/api/contrib-analyzers/org/apache/lucene/analysis/cjk/CJKTokenizer.html

and there seems to be another one even that no one has wrapped into
Solr:

http://lucene.apache.org/java/2_9_1/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/package-summary.html

So seems like the existing options are a little better than I thought,
though it would be nice to have some docs on properly configuring
these.

-Peter

On Tue, Nov 10, 2009 at 6:05 PM, Otis Gospodnetic
wrote:
Peter,

For CJK and n-grams, I think you don't want the *Edge* n-grams, but
just
n-grams.
Before you take the n-gram route, you may want to look at the smart
Chinese
analyzer in Lucene contrib (I think it works only for Simplified
Chinese) and
Sen (on java.net). I also spotted a Korean analyzer in the wild a few
months
back.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

- Original Message
From: Peter Wolanin
To: solr-user@lucene.apache.org
Sent: Tue, November 10, 2009 4:06:52 PM
Subject: any docs on solr.EdgeNGramFilterFactory?

This fairly recent blog post:

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

Thanks,

Peter

--
Peter M. Wolanin, Ph.D.
Momentum Specialist, Acquia. Inc.
peter.wola...@acquia.com

--
Robert Muir
rcm...@gmail.com

Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter

2009-11-11 Thread Ryan McKinley

The HTMLStripCharFilter will strip the html for the *indexed* terms,  
it does not effect the *stored* field.


If you don't want html in the stored field, can you just strip it out  
before passing to solr?



On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:


Hey Guys,
How do I add HTML/XML documents using SolrJ such that it does not by
pass the HTML char filter?

SolrJ escapes the HTML/XML value of a field, and that make it bypass
the HTML char filter. For example centercontent/center if added to
a field with HTMLStripCharFilter on the field using SolrJ, is not
stripped of center tags. But if check in analysis.jsp, it does get
stripped. When I look at the SolrJ XML feed, it looks like this:
adddoc boost=1.0field name=idhttp://haha.com/fieldfield
name=textlt;centergt;contentlt;/centergt;/field/doc/add

Any help is highly appreciated. Thanks.

--
Aseem

Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter

2009-11-11 Thread aseem cheema

Ohhh... you are a life saver... thank you so much.. it makes sense.

Aseem

On Wed, Nov 11, 2009 at 7:40 PM, Ryan McKinley ryan...@gmail.com wrote:
 The HTMLStripCharFilter will strip the html for the *indexed* terms, it does
 not effect the *stored* field.

 If you don't want html in the stored field, can you just strip it out before
 passing to solr?


 On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:

 Hey Guys,
 How do I add HTML/XML documents using SolrJ such that it does not by
 pass the HTML char filter?

 SolrJ escapes the HTML/XML value of a field, and that make it bypass
 the HTML char filter. For example centercontent/center if added to
 a field with HTMLStripCharFilter on the field using SolrJ, is not
 stripped of center tags. But if check in analysis.jsp, it does get
 stripped. When I look at the SolrJ XML feed, it looks like this:
 adddoc boost=1.0field name=idhttp://haha.com/fieldfield
 name=textlt;centergt;contentlt;/centergt;/field/doc/add

 Any help is highly appreciated. Thanks.

 --
 Aseem





-- 
Aseem

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2009-11-11 Thread aseem cheema

Alright. It turns out that escapedTags is not for what I thought it is for.
The problem that I am having with HTMLStripCharFilterFactory is that
it strips the html while indexing the field, but not while storing the
field. That is why what is see in analysis.jsp, which is index
analysis, does not match what gets stored... because.. well HTML is
stripped only for indexing. Makes so much sense.

Thanks to Ryan McKinley for clarifying this.
Aseem

On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com wrote:
 I am trying to post a document with the following content using SolrJ:
 centercontent/center
 I need the xml/html tags to be ignored. Even though this works fine in
 analysis.jsp, this does not work with SolrJ, as the client escapes the
  and  with lt; and gt; and HTMLStripCharFilterFactory does not
 strip those escaped tags. How can I achieve this? Any ideas will be
 highly appreciated.

 There is escapedTags in HTMLStripCharFilterFactory constructor. Is
 there a way to get that to work?
 Thanks
 --
 Aseem




-- 
Aseem

Re: [DIH] concurrent requests to DIH

2009-11-11 Thread Avlesh Singh

1. Is it considered as good practice to set up several DIH request
handlers, one for each possible parameter value?

Nothing wrong with this. My assumption is that you want to do this to speed
up indexing. Each DIH instance would block all others, once a Lucene commit
for the former is performed.

Nope.

I had done a similar exercise in my quest to write a
ParallelDataImportHandler. This thread might be of interest to you -
http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler.
Though there is a ticket in JIRA, I haven't been able to contribute this
back. If you think this is what you need, lemme know.

Cheers
Avlesh

On Thu, Nov 12, 2009 at 6:35 AM, Sascha Szott sz...@zib.de wrote:

Hi all,

I'm using the DIH in a parameterized way by passing request parameters
that are used inside of my data-config. All imports end up in the same
index.

1. Is it considered as good practice to set up several DIH request
handlers, one for each possible parameter value?

2. In case the range of parameter values is broad, it's not convenient to
define separate request handlers for each value. But this entails a
limitation (as far as I see): It is not possible to fire several request
to the same DIH handler (with different parameter values) at the same
time. However, in case several request handlers would be used (as in 1.),
concurrent requests (to the different handlers) are possible. So, how to
overcome this limitation?

Best,
Sascha

Re: Configuring Solr to use RAMDirectory

2009-11-11 Thread Otis Gospodnetic

I think not out of the box, but look at SOLR-243 issue in JIRA.

You could also put your index on ram disk (tmpfs), but it would be useless for 
writing to it.

Note that when people ask about loading the whole index in memory explicitly, 
it's often a premature optimization attempt.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Thomas Nguyen thngu...@ign.com
 To: solr-user@lucene.apache.org
 Sent: Wed, November 11, 2009 8:46:11 PM
 Subject: Configuring Solr to use RAMDirectory
 
 Is it possible to configure Solr to fully load indexes in memory?  I
 wasn't able to find any documentation about this on either their site or
 in the Solr 1.4 Enterprise Search Server book.

Re: weird problem with solr.DateField

2009-11-11 Thread Otis Gospodnetic

Try changing:
field name=lastUpdate type=date indexed=true stored=true default=NOW 
multiValued=false /

to:
field name=lastUpdate type=date indexed=true stored=true default=NOW 
multiValued=false required=true/
 
Then watch the logs for errors during indexing.

Otis--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: siping liu siping...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Wed, November 11, 2009 7:29:18 PM
 Subject: weird problem with solr.DateField
 
 
 Hi,
 
 I'm using Solr 1.4 (from nightly build about 2 months ago) and have this 
 defined 
 in solrconfig:
 
 
 omitNorms=true /
 
 
 multiValued=false /
 
 
 
 and following code that get executed once every night:
 
 CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(http://...;);
 solrServer.setRequestWriter(new BinaryRequestWriter());
 
 solrServer.add(documents);
 solrServer.commit();
 
 UpdateResponse deleteResult = solrServer.deleteByQuery(lastUpdate:[* TO 
 NOW-2HOUR]);
 solrServer.commit();
 
 
 
 The purpose is to refresh index with latest data (in documents).
 
 This works fine, except that after a few days I start to see a few documents 
 with no lastUpdate field (query -lastUpdate:[* TO *]) -- how can that be 
 possible?
 
 
 
 thanks in advance.
 
 
   
 _
 Windows 7: Unclutter your desktop.
 http://go.microsoft.com/?linkid=9690331ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009

Re: [DIH] blocking import operation

2009-11-11 Thread Noble Paul നോബിള്‍ नोब्ळ्

Yes , open an issue . This is a trivial change

On Thu, Nov 12, 2009 at 5:08 AM, Sascha Szott sz...@zib.de wrote:
 Noble,

 Noble Paul wrote:
 DIH imports are really long running. There is a good chance that the
 connection times out or breaks in between.
 Yes, you're right, I missed that point (in my case imports take no longer
 than a minute).

 how about a callback?
 Thanks for the hint. There was a discussion on adding a callback url to
 DIH a month ago, but it seems that no issue was raised. So, up to now its
 only possible to implement an appropriate Solr EventListener. Should we
 open an issue for supporting callback urls?

 Best,
 Sascha


 On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott sz...@zib.de wrote:
 Hi all,

 currently, DIH's import operation(s) only works asynchronously.
 Therefore,
 after submitting an import request, DIH returns immediately, while the
 import process (in case a large amount of data needs to be indexed)
 continues asynchronously behind the scenes.

 So, what is the recommended way to check if the import process has
 already
 finished? Or still better, is there any method / workaround that will
 block
 the import operation's caller until the operation has finished?

 In my application, the DIH receives some URL parameters which are used
 for
 determining the database name that is used within data-config.xml, e.g.

 http://localhost:8983/solr/dataimport?command=full-importdbname=foo

 Since only one DIH, /dataimport, is defined, but several database needs
 to
 be indexed, it is required to issue this command several times, e.g.

 http://localhost:8983/solr/dataimport?command=full-importdbname=foo

 ... wait until /dataimport?command=status says Indexing completed (but
 without using a loop that checks it again and again) ...

 http://localhost:8983/solr/dataimport?command=full-importdbname=barclean=false


 A suitable solution, at least IMHO, would be to have an additional DIH
 parameter which determines whether the import call is blocking on
 non-blocking, the default. As far as I see, this could be accomplished
 since
 Solr can execute more than one import operation at a time (it starts a
 new
 thread for each). Perhaps, my question is somehow related to the
 discussion
 [1] on ParallelDataImportHandler.

 Best,
 Sascha

 [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: ${dataimporter.delta.twitter_id} not getting populated in deltaImportQuery

2009-11-11 Thread Noble Paul നോബിള്‍ नोब्ळ्

are you sure the data comes back in the same name. Some DBs return the
field names in ALL CAPS

you may try out a delta_import using a full import too

http://wiki.apache.org/solr/DataImportHandlerFaq#My_delta-import_goes_out_of_memory_._Any_workaround_.3F

On Wed, Nov 11, 2009 at 9:55 PM, Mark Ellul m...@catalystic.com wrote:
 I have 2 entities from the root node, not sure if that makes a difference!

 On Wed, Nov 11, 2009 at 4:49 PM, Mark Ellul m...@catalystic.com wrote:

 Hi,

 I have a interesting issue...

 Basically I am trying to delta imports on solr 1.4 on a postgresql 8.3
 database.

 Basically when I am running a delta import with the entity below I get an
 exception  (see below the entity definition) showing the query its trying to
 run and you can see that its not populating the where clause of my
 dataImportQuery.

 I have tried ${dataimporter.delta.twitter_id} and ${dataimporter.delta.id}
 and get the same exceptions.

 Am I missing something obvious?

 Any help would be appreciated!

 Regards

 Mark


 entity name=Tweeter  pk=twitter_id
  query=
 select twitter_id,
 twitter_id as pk,
  1 as site_id,
 screen_name

 from api_tweeter WHERE
  tweet_mapreduce_on IS NOT NULL;
  transformer=TemplateTransformer

 deltaImportQuery=
                 select twitter_id,
  twitter_id as pk,
 1 as site_id,
 screen_name

 from api_tweeter
 where twitter_id=${dataimporter.delta.twitter_id };
 
 deltaQuery =select twitter_id from api_tweeter where  modified_on 
 '${dataimporter.last_index_time}' and tweet_mapreduce_on IS NOT NULL;

 
 field name=twitter_id column=twitter_id /


 /entity


 INFO: Completed parentDeltaQuery for Entity: Tweeter
 Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DocBuilder
 buildDocument
 SEVERE: Exception while processing: Tweeter document :
 SolrInputDocument[{}]
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query:                          select twitter_id,        twitter_id
 as pk,        1 as site_id,       screen_name   from api_tweeter     where
 twitter_id=;    Processing Document # 1
  at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
  at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
  at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
 at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
  at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
  at
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:276)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172)
  at
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352)
 at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
  at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
 Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at end of
 input
   Position: 1197
 at
 org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062)
 at
 org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795)
  at
 org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
 at
 org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479)
  at
 org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:353)
 at
 org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:345)
  at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246)
 ... 11 more
 Nov 11, 2009 3:35:44 PM org.apache.solr.handler.dataimport.DataImporter
 doDeltaImport
 SEVERE: Delta Import Failed
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query:                          select twitter_id,        twitter_id
 as pk,        1 as site_id,       screen_name  from api_tweeter     where
 twitter_id=;    Processing Document # 1
  at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
  at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
  at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
 at

Re: Persist in Core Admin

2009-11-11 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Thu, Nov 12, 2009 at 3:13 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 It looks like our core admin wiki doesn't cover the persist action?
 http://wiki.apache.org/solr/CoreAdmin

 I'd like to be able to persist the cores to solr.xml, even if solr
 persistent=false.  It seems like the persist action does this?
yes. But you will have to specify a 'file' parameter




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

37 matches

Mail list logo