Re: DataImportHandler Full Import completed successfully after SQLException

2009-06-25 Thread George
Noble, thank you for fixing this issue! :)

2009/6/25 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 OK , this should be a bug with JdbcDataSource.

 look at the line

 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:326)

 it is eating up the exception and logs error and goes back. I shall
 raise an issue

 thanks


 On Wed, Jun 24, 2009 at 11:12 PM, Georgeemagp...@gmail.com wrote:
  Hi,
  Yesterday I found out the following exception trying to index from an
 Oracle
  Database in my indexing process:
 
  2009-06-23 14:57:29,205 WARN
   [org.apache.solr.handler.dataimport.JdbcDataSource] Error reading data
  java.sql.SQLException: ORA-01555: snapshot too old: rollback segment
 number
  1 with name _SYSSMU1$ too small
 
  at
 
 oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
  at
 oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:110)
  at
 
 oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:171)
  at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455)
  at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:413)
  at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1030)
  at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:183)
  at oracle.jdbc.driver.T4CStatement.fetch(T4CStatement.java:1000)
  at
 
 oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:314)
  at
 oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:228)
  at
 
 org.jboss.resource.adapter.jdbc.WrappedResultSet.next(WrappedResultSet.java:1184)
  at
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:326)
  at
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$700(JdbcDataSource.java:223)
  at
 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:258)
  at
 
 org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:73)
  at
 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
  at
 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:231)
  at
 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335)
  at
 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224)
  at
 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
  at
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316)
  at
 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374)
  at
 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
  2009-06-23 14:57:29,206 INFO
   [org.apache.solr.handler.dataimport.DocBuilder] Full Import completed
  successfully
  2009-06-23 14:57:29,206 INFO  [org.apache.solr.update.UpdateHandler]
 start
  commit(optimize=true,waitFlush=false,waitSearcher=true)
 
  As you can see, Full Import completed successfully indexing a part (about
  7) of all expected documents (about 15). I don't know if it is a
 bug
  or not but certainly it's not the behaviour I expect in this situation.
 It
  should have rolled back, shouldn't it?
 
  Reading Solr code I can see that in line 314 of JdbcDataSource.java it
  throws a DataImportHandlerException with SEVERE errCode so I can't
  understand why my indexing process finishes correctly.
 
  I'm working with Solr trunk version (rev. 785397) and no custom
 properties
  (i.e. onError value is default 'abort') in DataImportHandler.
 
  George
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



Solr document security

2009-06-25 Thread pof

Hi, I am wanting to add document-level security that works as following: An
external process makes a query to the index, depending on their security
allowences based of a login id a list of hits are returned minus any the
user are meant to know even exist. I was thinking maybe a custom filter with
a JDBC connection to check security of the user vs. the document. I'm not
sure how I would add the filter or how to write the filter or how to get the
login id from a GET parameter. Any suggestions, comments etc.?

Thanks. Brett. 
-- 
View this message in context: 
http://www.nabble.com/Solr-document-security-tp24197620p24197620.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: How do I set up an embedded server with version 1.3.0 ?

2009-06-25 Thread Ian Smith
From: Manepalli, Kalyan [mailto:kalyan.manepa...@orbitz.com] 
Sent: 24 June 2009 19:47
To: solr-user@lucene.apache.org
Subject: RE: How do I set up an embedded server with version 1.3.0 ?

Hi Ian,
I use the embeddedSolrServer from a Solr Component. 
The code for invoking the embeddedSolrServer looks like this

SolrServer locServer = new
EmbeddedSolrServer(SolrCore.getCoreDescriptor()
.getCoreContainer(), locationCore); Where locationCore
is the core name in the multicore environment.
In single core env you can pass 


Thanks,
Kalyan Manepalli



Hi Kalyan,

Thanks for the reply, but it does not work for me as getCoreDescriptor()
is NOT a static method of SolrCore.  So, I am still left trying to
instantiate a SolCore instance to pass to the EmbeddedSolrServer
constructor.

Can you or someone else possibly help me with a working SolrCore
constructor call?

TIA,

Ian.

Website Content Management

Tamar Science Park, 15 Research Way, Plymouth, PL6 8BT

Save a tree, think before printing this email.

This email contains proprietary information, some or all of which may be 
legally privileged. It is for the intended recipient only. If an addressing or 
transmission error has misdirected this email, please notify the author by 
replying to this email. If you are not the intended recipient you may not use, 
disclose, distribute, copy, print or rely on this email. 
 

Email transmission cannot be guaranteed to be secure or error free, as 
information may be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete or contain viruses. This email and any files attached to it have 
been checked with virus detection software before transmission. You should 
nonetheless carry out your own virus check before opening any attachment. GOSS 
Interactive Ltd accepts no liability for any loss or damage that may be caused 
by software viruses.
 
Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, 
Plymouth, PL1 1LG.  Company Registration No: 3553908  
 
 




Fwd: [Solr Wiki] Update of SolrReplication by NoblePaul

2009-06-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
some of the replication commands have been changed in the trunk
https://issues.apache.org/jira/browse/SOLR-1216

So, please keep it in mind if you are alreday using it and you are
upgrading to a new build. Refer the wiki for the latest commands


-- Forwarded message --
From: Apache Wiki wikidi...@apache.org
Date: Thu, Jun 25, 2009 at 2:24 PM
Subject: [Solr Wiki] Update of SolrReplication by NoblePaul
To: solr-comm...@lucene.apache.org


Dear Wiki user,

You have subscribed to a wiki page or wiki category on Solr Wiki for
change notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/SolrReplication

The comment on the change is:
command names changed

--
         !--Replicate on 'optimize'. Other values can be 'commit',
'startup'. It is possible to have multiple entries of this config
string--
         str name=replicateAfteroptimize/str

-         !--Create a snapshot for backup purposes on 'optimize'.
Other values can be 'commit', 'startup'. It is possible to have
multiple entries of this config string--
+         !--Create a snapshot for backup purposes on 'optimize'.
Other values can be 'commit', 'startup'. It is possible to have
multiple entries of this config string .note that this is just for
backup. Replication does not require this --
-         str name=replicateAfteroptimize/str
+         str name=snapshotoptimize/str

         !--If configuration files need to be replicated give the
names here, separated by comma --
         str name=confFilesschema.xml,stopwords.txt,elevate.xml/str
@@ -107, +107 @@


 == HTTP API ==
 These commands can be invoked over HTTP to the !ReplicationHandler
-  * Abort copying snapshot from master to slave command :
http://slave_host:port/solr/replication?command=abort
+  * Abort copying index from master to slave command :
http://slave_host:port/solr/replication?command=abortfetch
  * Force a snapshot on master. This is useful to take periodic
backups .command :
http://master_host:port/solr/replication?command=snapshoot
-  * Force a snap pull on slave from master command :
http://slave_host:port/solr/replication?command=snappull
+  * Force a snap pull on slave from master command :
http://slave_host:port/solr/replication?command=fetchindex
   * It is possible to pass on extra attribute 'masterUrl' or other
attributes like 'compression' (or any other parameter which is
specified in the lst name=slave tag) to do a one time replication
from a master. This obviates the need for hardcoding the master in the
slave.
  * Disable polling for snapshot from slave command :
http://slave_host:port/solr/replication?command=disablepoll
  * Enable polling for snapshot from slave command :
http://slave_host:port/solr/replication?command=enablepoll



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: How do I set up an embedded server with version 1.3.0 ?

2009-06-25 Thread Shalin Shekhar Mangar
On Thu, Jun 25, 2009 at 2:05 PM, Ian Smith ian.sm...@gossinteractive.comwrote:


 Can you or someone else possibly help me with a working SolrCore
 constructor call?


Here is a working example for single index/core:

System.setProperty(solr.solr.home,
/home/shalinsmangar/work/oss/branch-1.3/example/solr);
CoreContainer.Initializer initializer = new
CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer,
);
SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, 101);
server.add(doc);
server.commit(true, true);
SolrQuery query = new SolrQuery();
query.setQuery(id:101);
QueryResponse response = server.query(query);
SolrDocumentList list = response.getResults();
System.out.println(list.size() =  + list.size());
coreContainer.shutdown();

Make sure your dataDir in solrconfig.xml is fixed (absolute) otherwise your
data directory will get created relative to the current working directory
(or you could set a system property for solr.data.dir)

Hope that helps. I'll add it to the wiki too.
-- 
Regards,
Shalin Shekhar Mangar.


Top tf_idf in TermVectorComponent

2009-06-25 Thread JCodina

In order to perform any further study of the resultset, like clustering, the
TermVectorComponent
gives the list of words with the correspoing tf, idf, 
but this list can be huge for each document, and most of the terms may have
a low tf or a too high df, 
maybe, it is usefull to compare the relative increment of DF to the
collection in order to improve the facets (show only these terms that the
relative DF in the query is higher than in the full  collection)

To perform this it could be interesting that the TermVectorComponent could
sort the results by  some of these options:
*tf
*DF
* tf/df (to simplify) or tf*idf where idf is computed as log(total_docs/df)
and truncate the list to a number of words or a given value 
 
or maybe there is another way to perform this?
Joan
-- 
View this message in context: 
http://www.nabble.com/Top-tf_idf-in-TermVectorComponent-tp24201076p24201076.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: How do I set up an embedded server with version 1.3.0 ?

2009-06-25 Thread Ian Smith
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: 25 June 2009 10:41
To: solr-user@lucene.apache.org
Subject: Re: How do I set up an embedded server with version 1.3.0 ?

On Thu, Jun 25, 2009 at 2:05 PM, Ian Smith
ian.sm...@gossinteractive.comwrote:


 Can you or someone else possibly help me with a working SolrCore 
 constructor call?


Here is a working example for single index/core:

System.setProperty(solr.solr.home,
/home/shalinsmangar/work/oss/branch-1.3/example/solr);
CoreContainer.Initializer initializer = new
CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
EmbeddedSolrServer server = new
EmbeddedSolrServer(coreContainer, );
SolrInputDocument doc = new SolrInputDocument();
doc.addField(id, 101);
server.add(doc);
server.commit(true, true);
SolrQuery query = new SolrQuery();
query.setQuery(id:101);
QueryResponse response = server.query(query);
SolrDocumentList list = response.getResults();
System.out.println(list.size() =  + list.size());
coreContainer.shutdown();

Make sure your dataDir in solrconfig.xml is fixed (absolute) otherwise
your data directory will get created relative to the current working
directory (or you could set a system property for solr.data.dir)

Hope that helps. I'll add it to the wiki too.
--
Regards,
Shalin Shekhar Mangar.
---

Fantastic, I now have the embedded server working, thank you!

PS. Sorry about all the huge sigs, I don't have the facility to suppress
them from work, I'll post from a webmail account in future . . .

Ian.

Website Content Management

Tamar Science Park, 15 Research Way, Plymouth, PL6 8BT

Save a tree, think before printing this email.

This email contains proprietary information, some or all of which may be 
legally privileged. It is for the intended recipient only. If an addressing or 
transmission error has misdirected this email, please notify the author by 
replying to this email. If you are not the intended recipient you may not use, 
disclose, distribute, copy, print or rely on this email. 
 

Email transmission cannot be guaranteed to be secure or error free, as 
information may be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete or contain viruses. This email and any files attached to it have 
been checked with virus detection software before transmission. You should 
nonetheless carry out your own virus check before opening any attachment. GOSS 
Interactive Ltd accepts no liability for any loss or damage that may be caused 
by software viruses.
 
Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, 
Plymouth, PL1 1LG.  Company Registration No: 3553908  
 
 




Re: Solr document security

2009-06-25 Thread Norberto Meijome
On Wed, 24 Jun 2009 23:20:26 -0700 (PDT)
pof melbournebeerba...@gmail.com wrote:

 
 Hi, I am wanting to add document-level security that works as following: An
 external process makes a query to the index, depending on their security
 allowences based of a login id a list of hits are returned minus any the
 user are meant to know even exist. I was thinking maybe a custom filter with
 a JDBC connection to check security of the user vs. the document. I'm not
 sure how I would add the filter or how to write the filter or how to get the
 login id from a GET parameter. Any suggestions, comments etc.?

Hi Brett,
(keeping in mind that i've been away from SOLR for 8 months, but i
dont think this was added of late)

standard approach is to manage security @ your
application layer, not @ SOLR. ie, search, return documents (which should
contain some kind of data to identify their ACL ) and then you can decide
whether to show it or not. 

HIH
_
{Beto|Norberto|Numard} Meijome

They never open their mouths without subtracting from the sum of human
knowledge. Thomas Brackett Reed

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Function query using Map

2009-06-25 Thread David Baker

Noble Paul നോബിള്‍ नोब्ळ् wrote:

The five parameter feature is added in solr1.4 . which version of solr
are you using?

On Wed, Jun 24, 2009 at 12:57 AM, David Baker dav...@mate1inc.com wrote:
  

Hi,

I'm trying to use the map function with a function query.  I want to map a 
particular value to 1 and all other values to 0.  We currently use the map 
function that has 4 parameters with no problem.  However, for the map function 
with 5 parameters, I get a parse error.  The following are the query and error 
returned:

_query_
id:[* TO *] _val_:map(ethnicity,3,3,1,0)

_error message_

*type* Status report
*message* _org.apache.lucene.queryParser.ParseException: Cannot parse 'id:[* TO *] 
_val_:map(ethnicity,3,3,1,0)': Expected ')' at position 20 in 
'map(ethnicity,3,3,1,0)'_
*description* _The request sent by the client was syntactically incorrect 
(org.apache.lucene.queryParser.ParseException: Cannot parse 'id:[* TO *] 
_val_:map(ethnicity,3,3,1,0)': Expected ')' at position 20 in 
'map(ethnicity,3,3,1,0)').
_

It appears that the parser never evaluates the map string for anything other 
than the 4 parameters version.  Could anyone give me some insight into this?  
Thanks in advance.






--
-
Noble Paul | Principal Engineer| AOL | http://aol.com
  

we're running 1.3, which explains this. Thanks for the response.


Python Response Bug?

2009-06-25 Thread Michael Beccaria
I'm not sure, but I think I ran across some unexpected behavior in the
python response in solr 1.3 (1.3.0 694707 - grantingersoll - 2008-09-12
11:06:47).

I'm running Python 2.5 and using eval to convert the string solr returns
to python data objects.

I have a blank field in my xml file that I am importing labeled
contacthours. If I make the field a string type, the interpreter
doesn't throw an error. If I make the field a float type, python
throws an error on the eval function. Here is portion of the output from
solr, with the two differences near the end by the contacthours
variable:


Broken (Python didn't like this):
{'responseHeader':{'status':0,'QTime':0,'params':{'q':'id:161','wt':'p
ython'}},'response':{'numFound':1,'start':0,'docs':[{'approvalpending':'
','assessmentcourseprograminstitutionimprovement':u'The students\u2019
final reports along with the \u201cBiology Externship Completion
Form\u201d, which is completed by externship supervisors, will be used
for this assessment.','contacthourrationale':'Variable hours - no
explanation given','contacthours':,'coursearea':'BIO',


Good (this worked):
{'responseHeader':{'status':0,'QTime':0,'params':{'q':'id:161','wt':'p
ython'}},'response':{'numFound':1,'start':0,'docs':[{'approvalpending':'
','assessmentcourseprograminstitutionimprovement':u'The students\u2019
final reports along with the \u201cBiology Externship Completion
Form\u201d, which is completed by externship supervisors, will be used
for this assessment.','contacthourrationale':'Variable hours - no
explanation given','contacthours':'','coursearea':'BIO',


Any insights? Is this a bug or am I missing something?

Mike Beccaria 
Systems Librarian 
Head of Digital Initiatives 
Paul Smith's College 
518.327.6376 
mbecca...@paulsmiths.edu 
 



Re: Python Response Bug?

2009-06-25 Thread darren
The first JSON is invalid as you see because of the missing value.

:, is not valid JSON syntax.

The reason it works for string type is because the empty string '' is a
valid field value but _nothing_ (the first examepl) is not. There is no
empty float placeholder that JSON likes. So either it has to be an empty
string or some legitimate float value.

FAIK, Solr doesn't generate empty numeric values compatible with JSON.
zero and null are different things.



 I'm not sure, but I think I ran across some unexpected behavior in the
 python response in solr 1.3 (1.3.0 694707 - grantingersoll - 2008-09-12
 11:06:47).

 I'm running Python 2.5 and using eval to convert the string solr returns
 to python data objects.

 I have a blank field in my xml file that I am importing labeled
 contacthours. If I make the field a string type, the interpreter
 doesn't throw an error. If I make the field a float type, python
 throws an error on the eval function. Here is portion of the output from
 solr, with the two differences near the end by the contacthours
 variable:


 Broken (Python didn't like this):
 {'responseHeader':{'status':0,'QTime':0,'params':{'q':'id:161','wt':'p
 ython'}},'response':{'numFound':1,'start':0,'docs':[{'approvalpending':'
 ','assessmentcourseprograminstitutionimprovement':u'The students\u2019
 final reports along with the \u201cBiology Externship Completion
 Form\u201d, which is completed by externship supervisors, will be used
 for this assessment.','contacthourrationale':'Variable hours - no
 explanation given','contacthours':,'coursearea':'BIO',


 Good (this worked):
 {'responseHeader':{'status':0,'QTime':0,'params':{'q':'id:161','wt':'p
 ython'}},'response':{'numFound':1,'start':0,'docs':[{'approvalpending':'
 ','assessmentcourseprograminstitutionimprovement':u'The students\u2019
 final reports along with the \u201cBiology Externship Completion
 Form\u201d, which is completed by externship supervisors, will be used
 for this assessment.','contacthourrationale':'Variable hours - no
 explanation given','contacthours':'','coursearea':'BIO',


 Any insights? Is this a bug or am I missing something?

 Mike Beccaria
 Systems Librarian
 Head of Digital Initiatives
 Paul Smith's College
 518.327.6376
 mbecca...@paulsmiths.edu






RE: Python Response Bug?

2009-06-25 Thread Michael Beccaria
Regardless, I think it should return valid JSON so programs don't crash
when trying to interpret it. I don't think about these things often so
maybe I'm missing something obvious, but I think putting in an empty
string is better than putting in nothing and having it break.

My 2 cents.
Mike


-Original Message-
From: dar...@ontrenet.com [mailto:dar...@ontrenet.com] 
Sent: Thursday, June 25, 2009 11:11 AM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Python Response Bug?

The first JSON is invalid as you see because of the missing value.

:, is not valid JSON syntax.

The reason it works for string type is because the empty string '' is a
valid field value but _nothing_ (the first examepl) is not. There is no
empty float placeholder that JSON likes. So either it has to be an empty
string or some legitimate float value.

FAIK, Solr doesn't generate empty numeric values compatible with JSON.
zero and null are different things.



 I'm not sure, but I think I ran across some unexpected behavior in the
 python response in solr 1.3 (1.3.0 694707 - grantingersoll -
2008-09-12
 11:06:47).

 I'm running Python 2.5 and using eval to convert the string solr
returns
 to python data objects.

 I have a blank field in my xml file that I am importing labeled
 contacthours. If I make the field a string type, the interpreter
 doesn't throw an error. If I make the field a float type, python
 throws an error on the eval function. Here is portion of the output
from
 solr, with the two differences near the end by the contacthours
 variable:


 Broken (Python didn't like this):

{'responseHeader':{'status':0,'QTime':0,'params':{'q':'id:161','wt':'p

ython'}},'response':{'numFound':1,'start':0,'docs':[{'approvalpending':'
 ','assessmentcourseprograminstitutionimprovement':u'The students\u2019
 final reports along with the \u201cBiology Externship Completion
 Form\u201d, which is completed by externship supervisors, will be used
 for this assessment.','contacthourrationale':'Variable hours - no
 explanation given','contacthours':,'coursearea':'BIO',


 Good (this worked):

{'responseHeader':{'status':0,'QTime':0,'params':{'q':'id:161','wt':'p

ython'}},'response':{'numFound':1,'start':0,'docs':[{'approvalpending':'
 ','assessmentcourseprograminstitutionimprovement':u'The students\u2019
 final reports along with the \u201cBiology Externship Completion
 Form\u201d, which is completed by externship supervisors, will be used
 for this assessment.','contacthourrationale':'Variable hours - no
 explanation given','contacthours':'','coursearea':'BIO',


 Any insights? Is this a bug or am I missing something?

 Mike Beccaria
 Systems Librarian
 Head of Digital Initiatives
 Paul Smith's College
 518.327.6376
 mbecca...@paulsmiths.edu






Re: building custom RequestHandlers

2009-06-25 Thread Julian Davchev
Ok, I glued all stuff  and ended up extending
handler.component.SearchHandlercause  I wan to use all it's
functionlity only toadjust the q param before it gets preocessed.
 
How exactly can I get the q and set it back later?
From digging code it seems thats the way

SolrParams p = req.getParams();
String words = p.get(q);

but I get SolrParamsis depricated  and type mismatch cannot convert
SolrParams to SolrParams  

even like this
SolrParams p = (SolrParams) req.getParams();

I get error and 500 when trying to use it.

Any pointers to howto set and get are more than welcome.   at end of it
I am using super.handleRequestBody(req, rsp);   so no other stuff to mess.



Mats Lindh wrote:
 I wrote a small post regarding how to create an analysis filter about a year
 ago. I'm guessing that the process is quite similar when developing a custom
 request handler:

 http://e-mats.org/2008/06/writing-a-solr-analysis-filter-plugin/

 Hope that helps.

 --mats

 On Wed, Jun 24, 2009 at 12:54 PM, Julian Davchev j...@drun.net wrote:

   
 Well it's really lovely whats in there but this is just configuration
 aspect. Is there sample where should I place my class etc
 and howto complie and all. just simple top to bottom example.  I guess
 most of those aspects might be java but they are solr related as well.

 Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 this part of the doc explains what you shold do to write a custom
   
 requesthandler
 
   
 http://wiki.apache.org/solr/SolrPlugins#head-7c0d03515c496017f6c0116ebb096e34a872cb61
 
 On Wed, Jun 24, 2009 at 3:35 AM, Julian Davchevj...@drun.net wrote:

   
 Is it just me or this is thread steal? nothing todo with what thread is
 originally about.
 Cheers

 Bill Dueber wrote:

 
 Is it possible to change the javascript  output? I find some of the
 information choices (e.g., that facet information is returned in a flat
 list, with facet names in the even-numbered indexes and number-of-items
 following them in the odd-numbered indexes) kind of annoying.

 On Tue, Jun 23, 2009 at 12:16 PM, Eric Pugh 
   
 ep...@opensourceconnections.com
 
   
 wrote:


 
 Like most things JavaScript, I found that I had to just dig through it
 
 and
 
 play with it.  However, the Reuters demo site was very easy to
 
 customize to
 
 interact with my own Solr instance, and I went from there.


 On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote:

  Never used it.. I am just looking in docs how can I extend solr but
 
 no
 
 
 luck so far :(
 Hoping for some docs or real extend example.



 Eric Pugh wrote:



   
 Are you using the JavaScript interface to Solr?
 http://wiki.apache.org/solr/SolrJS

 It may provide much of what you are looking for!

 Eric

 On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote:

  I am using solr and php quite nicely.


 
 Currently the work flow includes some manipulation on php side so I
 correctly format the query string and pass to tomcat/solr.
 I somehow want to build own request handler in java so I skip the
   
 whole
 
 apache/php request that is just for formating.
 This will saves me tons of requests to apache since I use solr
   
 directly
 
 from javascript.

 Would like to ask if there is something ready that I can use and
   
 adjust.
 
 I am kinda new in Java but once I get the pointers
 I think should be able to pull out.
 Thanks,
 JD





   
 -
 Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
 http://www.opensourceconnections.com
 Free/Busy: http://tinyurl.com/eric-cal







 
 -
 Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
 http://www.opensourceconnections.com
 Free/Busy: http://tinyurl.com/eric-cal







 
   


   
 

   



RE: Python Response Bug?

2009-06-25 Thread darren
The problem is what should Solr put for a float that is undefined? There
is no such value. Typically a value anomaly should not exist and so when
data structures break, they reveal these situations. The correct action is
to explore why a non-value is making its way into the index and correct it
before that by applying your own logic to populate the field with a
tangible value.

 Regardless, I think it should return valid JSON so programs don't crash
 when trying to interpret it. I don't think about these things often so
 maybe I'm missing something obvious, but I think putting in an empty
 string is better than putting in nothing and having it break.

 My 2 cents.
 Mike


 -Original Message-
 From: dar...@ontrenet.com [mailto:dar...@ontrenet.com]
 Sent: Thursday, June 25, 2009 11:11 AM
 To: solr-user@lucene.apache.org
 Cc: solr-user@lucene.apache.org
 Subject: Re: Python Response Bug?

 The first JSON is invalid as you see because of the missing value.

 :, is not valid JSON syntax.

 The reason it works for string type is because the empty string '' is a
 valid field value but _nothing_ (the first examepl) is not. There is no
 empty float placeholder that JSON likes. So either it has to be an empty
 string or some legitimate float value.

 FAIK, Solr doesn't generate empty numeric values compatible with JSON.
 zero and null are different things.



 I'm not sure, but I think I ran across some unexpected behavior in the
 python response in solr 1.3 (1.3.0 694707 - grantingersoll -
 2008-09-12
 11:06:47).

 I'm running Python 2.5 and using eval to convert the string solr
 returns
 to python data objects.

 I have a blank field in my xml file that I am importing labeled
 contacthours. If I make the field a string type, the interpreter
 doesn't throw an error. If I make the field a float type, python
 throws an error on the eval function. Here is portion of the output
 from
 solr, with the two differences near the end by the contacthours
 variable:


 Broken (Python didn't like this):

 {'responseHeader':{'status':0,'QTime':0,'params':{'q':'id:161','wt':'p

 ython'}},'response':{'numFound':1,'start':0,'docs':[{'approvalpending':'
 ','assessmentcourseprograminstitutionimprovement':u'The students\u2019
 final reports along with the \u201cBiology Externship Completion
 Form\u201d, which is completed by externship supervisors, will be used
 for this assessment.','contacthourrationale':'Variable hours - no
 explanation given','contacthours':,'coursearea':'BIO',


 Good (this worked):

 {'responseHeader':{'status':0,'QTime':0,'params':{'q':'id:161','wt':'p

 ython'}},'response':{'numFound':1,'start':0,'docs':[{'approvalpending':'
 ','assessmentcourseprograminstitutionimprovement':u'The students\u2019
 final reports along with the \u201cBiology Externship Completion
 Form\u201d, which is completed by externship supervisors, will be used
 for this assessment.','contacthourrationale':'Variable hours - no
 explanation given','contacthours':'','coursearea':'BIO',


 Any insights? Is this a bug or am I missing something?

 Mike Beccaria
 Systems Librarian
 Head of Digital Initiatives
 Paul Smith's College
 518.327.6376
 mbecca...@paulsmiths.edu








Re: Python Response Bug?

2009-06-25 Thread Chris Hostetter

: Subject: Python Response Bug?
: In-Reply-To: 20090625214339.415e6...@suspectum.octantis.com.au

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking



-Hoss



Re: Reverse querying

2009-06-25 Thread AlexElba



Otis Gospodnetic wrote:
 
 
 Alex  Oleg,
 
 Look at MemoryIndex in Lucene's contrib.  It's the closest thing to what
 you are looking for.  What you are describing is sometimes referred to as
 prospective search, sometimes saved searches, and a few other names.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: AlexElba ramal...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, June 24, 2009 7:47:20 PM
 Subject: Reverse querying
 
 
 Hello,
 
 I have problem which I am trying to solve using solr.
 
 I have search text (term) and I have index full of words which are mapped
 to
 ids.
 
 Is there any query that I can run to do this?
 
 Example:
 
 Term
 3) A recommendation to use VAR=value in the configure command line will
  not work with some 'configure' scripts that comply to GNU standards
  but are not generated by autoconf. 
 
 Index docs
 
 id:1 name:recommendation 
 ...
 id:3 name:GNU
 id:4 name food
 
 after running query I want to get as results 1 and 3 
 
 Thanks
 
 -- 
 View this message in context: 
 http://www.nabble.com/Reverse-querying-tp24194777p24194777.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

Hello,
I looked into this MemoryIndex, there search is returning only score. Which
will mean is it here or not

I build test method base on example

Term:

On my last night in the Silicon Valley area, I decided to head up the east
side of San Francisco Bay to 
visit Vito’s Pizzeria located in Newark, California.  I have to say it was
excellent!  
I met the owner (Vito!) and after eating a couple slices I introduced
myself.  
I was happy to know he was familiar with the New York Pizza Blog and the New
York Pizza Finder directory.   
Once we got to talking he decided I NEEDED to try some bread sticks and
home-made marinara 
sauce and they were muy delicioso.  I finished off my late night snack with
a meatball dipped in the same marinara.


Data {Silicon Valley, New York, Chicago}


public static void find(String term, SetString data) throws Exception 
{

Analyzer analyzer = PatternAnalyzer.EXTENDED_ANALYZER;
MemoryIndex index = new MemoryIndex();
int i = 0;
for (String str : data) {
index.addField(bn + i, str, analyzer);
i++;
}
QueryParser parser = new QueryParser(bn*, analyzer);
Query query = parser.parse(URLEncoder.encode(term, UTF-8));
float score = index.search(query);
if (score  0.0f) {
System.out.println(it's a match);
} else {
System.out.println(no match found);
}
// System.out.println(indexData= + index.toString());

}

no match found 


What I am doing wrong?

Thanks,
Alex
-- 
View this message in context: 
http://www.nabble.com/Reverse-querying-tp24194777p24208522.html
Sent from the Solr - User mailing list archive at Nabble.com.



Empty results after merging index via IndexMergeTool

2009-06-25 Thread jayakeerthi s
Hi All,


I am trying to merge two index using mergeindextool. The two index created
using solr1.4 and fine showing results .

Used the below cmd as per the
http://wiki.apache.org/solr/MergingSolrIndexes#head-feb9246bab59b54c0ba361d84981973976566c2a
to merge the two index

java -cp
C:\jbdevstudio\jboss-eap\jboss-as\bin\Core\lib\lucene-core-2.9-dev.jar
C:\jbdevstudio\jboss-eap\jboss-as\bin\Core\lib\lucene-misc-2.4.1.jar
org.apache.lucene.misc.IndexMergeTool
C:\jbdevstudio\jboss-eap\jboss-as\bin\Core\core\data
C:\jbdevstudio\jboss-eap\jboss-as\bin\Core\core1\data\index
C:\jbdevstudio\jboss-eap\jboss-as\bin\Core\core2\data\index

After exeuting the above cmd got the result as
Merging...
Optimizing...
Done.

The core data folder contains the files  _0.cfs , segments.gen,segments_2 
Once I chk the results from the merged data respose got as zero results no
documents found.

I am using lucene-core-2.9-dev.jar and lucene-misc-2.4.1.jar  files

Please help resolve the issue.

Thanks in advance
Jay


Is it possible to apply index-time synonyms just for a section of the index

2009-06-25 Thread anuvenk

I've posted a few questions on synonyms before and finally understood how it
worked and settled with index-time synonyms. Seems to work much better than
query time synonyms. But now @ my work, they have a special request. They
want certain synonyms to be applied only to certain sections of the index.
For example, we have legal faqs, forms etc and we have attorneys in our
index.
The following synonyms for example,
california,san diego
florida,miami
So for a search 'real estate san diego', it makes sense to return all faqs,
forms for 'california' in the index but doesn't make sense to return a real
estate attorney elsewhere in california (like burbank) besides just
restricting to san diego attorneys.
To be more clear I want to be able to return all california faqs  forms for
'real estate san diego' but not all california attorneys for the same. That
means, i should index the faqs, forms with the state = city mappings as
above but not for attorneys.
Well I could index all other resources like faqs, forms first with these
synonyms, then remove them and index attorneys. But that wouldn't work well
in my case because we have a scheduler set up that runs every night to index
any new resources from our database.
Can someone suggest a good solution for this?




-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-apply-index-time-synonyms-just-for-a-section-of-the-index-tp24209490p24209490.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is it possible to apply index-time synonyms just for a section of the index

2009-06-25 Thread rswart

What is stopping you from defining different field types for faqs and
attorneys? One with index time synomyms and one without.



anuvenk wrote:
 
 I've posted a few questions on synonyms before and finally understood how
 it worked and settled with index-time synonyms. Seems to work much better
 than query time synonyms. But now @ my work, they have a special request.
 They want certain synonyms to be applied only to certain sections of the
 index.
 For example, we have legal faqs, forms etc and we have attorneys in our
 index.
 The following synonyms for example,
 california,san diego
 florida,miami
 So for a search 'real estate san diego', it makes sense to return all
 faqs, forms for 'california' in the index but doesn't make sense to return
 a real estate attorney elsewhere in california (like burbank) besides just
 restricting to san diego attorneys.
 To be more clear I want to be able to return all california faqs  forms
 for 'real estate san diego' but not all california attorneys for the same.
 That means, i should index the faqs, forms with the state = city mappings
 as above but not for attorneys.
 Well I could index all other resources like faqs, forms first with these
 synonyms, then remove them and index attorneys. But that wouldn't work
 well in my case because we have a scheduler set up that runs every night
 to index any new resources from our database.
 Can someone suggest a good solution for this?
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-apply-index-time-synonyms-just-for-a-section-of-the-index-tp24209490p24210694.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is it possible to apply index-time synonyms just for a section of the index

2009-06-25 Thread anuvenk

That's right. Simple. I can very well do that. Why didn't I think of it.
Thanks.

rswart wrote:
 
 What is stopping you from defining different field types for faqs and
 attorneys? One with index time synomyms and one without.
 
 
 
 anuvenk wrote:
 
 I've posted a few questions on synonyms before and finally understood how
 it worked and settled with index-time synonyms. Seems to work much better
 than query time synonyms. But now @ my work, they have a special request.
 They want certain synonyms to be applied only to certain sections of the
 index.
 For example, we have legal faqs, forms etc and we have attorneys in our
 index.
 The following synonyms for example,
 california,san diego
 florida,miami
 So for a search 'real estate san diego', it makes sense to return all
 faqs, forms for 'california' in the index but doesn't make sense to
 return a real estate attorney elsewhere in california (like burbank)
 besides just restricting to san diego attorneys.
 To be more clear I want to be able to return all california faqs  forms
 for 'real estate san diego' but not all california attorneys for the
 same. That means, i should index the faqs, forms with the state = city
 mappings as above but not for attorneys.
 Well I could index all other resources like faqs, forms first with these
 synonyms, then remove them and index attorneys. But that wouldn't work
 well in my case because we have a scheduler set up that runs every night
 to index any new resources from our database.
 Can someone suggest a good solution for this?
 
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-apply-index-time-synonyms-just-for-a-section-of-the-index-tp24209490p24210788.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-25 Thread Bradford Stephens
Hey all,

Just writing a quick note of thanks, we had another solid group of
people show up! As always, we learned quite a lot about interesting
use cases for Hadoop, Lucene, and the rest of the Apache 'Cloud
Stack'.

 I couldn't get it taped, but we talked about:

-Scaling Lucene with Katta and the Katta infrastructure
-the need for low-latency BI on distributed document stores
-Lots and lots of detail on Amazon Elastic MapReduce

We'll be doing it again next month --  July 29th.

On Mon, Jun 22, 2009 at 5:40 PM, Bradford
Stephensbradfordsteph...@gmail.com wrote:
 Hey all, just a friendly reminder that this is Wednesday! I hope to see
 everyone there again. Please let me know if there's something interesting
 you'd like to talk about -- I'll help however I can. You don't even need a
 Powerpoint presentation -- there's many whiteboards. I'll try to have a
 video cam, but no promises.
 Feel free to call at 904-415-3009 if you need directions or any questions :)
 ~~`
 Greetings,

 On the heels of our smashing success last month, we're going to be
 convening the Pacific Northwest (Oregon and Washington)
 Hadoop/HBase/Lucene/etc. meetup on the last Wednesday of June, the
 24th.  The meeting should start at 6:45, organized chats will end
 around  8:00, and then there shall be discussion and socializing :)

 The meeting will be at the University of Washington in
 Seattle again. It's in the Computer Science building (not electrical
 engineering!), room 303, located
 here: http://www.washington.edu/home/maps/southcentral.html?80,70,792,660

 If you've ever wanted to learn more about distributed computing, or
 just see how other people are innovating with Hadoop, you can't miss
 this opportunity. Our focus is on learning and education, so every
 presentation must end with a few questions for the group to research
 and discuss. (But if you're an introvert, we won't mind).

 The format is two or three 15-minute deep dive talks, followed by
 several 5 minute lightning chats. We had a few interesting topics
 last month:

 -Building a Social Media Analysis company on the Apache Cloud Stack
 -Cancer detection in images using Hadoop
 -Real-time OLAP on HBase -- is it possible?
 -Video and Network Flow Analysis in Hadoop vs. Distributed RDBMS
 -Custom Ranking in Lucene

 We already have one deep dive scheduled this month, on truly
 scalable Lucene with Katta. If you've been looking for a way to handle
 those large Lucene indices, this is a must-attend!

 Looking forward to seeing everyone there again.

 Cheers,
 Bradford

 http://www.roadtofailure.com -- The Fringes of Distributed Computing,
 Computer Science, and Social Media.


Re: Solr document security

2009-06-25 Thread pof

Thats what I was going to do originally, however what is stopping a user from
simply running a search through http://localhost:8983/solr/admin/ of the
index server?


Norberto Meijome-6 wrote:
 
 On Wed, 24 Jun 2009 23:20:26 -0700 (PDT)
 pof melbournebeerba...@gmail.com wrote:
 
 
 Hi, I am wanting to add document-level security that works as following:
 An
 external process makes a query to the index, depending on their security
 allowences based of a login id a list of hits are returned minus any the
 user are meant to know even exist. I was thinking maybe a custom filter
 with
 a JDBC connection to check security of the user vs. the document. I'm not
 sure how I would add the filter or how to write the filter or how to get
 the
 login id from a GET parameter. Any suggestions, comments etc.?
 
 Hi Brett,
 (keeping in mind that i've been away from SOLR for 8 months, but i
 dont think this was added of late)
 
 standard approach is to manage security @ your
 application layer, not @ SOLR. ie, search, return documents (which should
 contain some kind of data to identify their ACL ) and then you can decide
 whether to show it or not. 
 
 HIH
 _
 {Beto|Norberto|Numard} Meijome
 
 They never open their mouths without subtracting from the sum of human
 knowledge. Thomas Brackett Reed
 
 I speak for myself, not my employer. Contents may be hot. Slippery when
 wet.
 Reading disclaimers makes you go blind. Writing them is worse. You have
 been
 Warned.
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-document-security-tp24197620p24212752.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr document security

2009-06-25 Thread Otis Gospodnetic

That URL to your Solr Admin page should never be exposed to the outside world.  
You can play with network, routing, DNS and other similar things to make sure 
one can't get to this from the outside even if the URL is know.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: pof melbournebeerba...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, June 25, 2009 7:40:12 PM
 Subject: Re: Solr document security
 
 
 Thats what I was going to do originally, however what is stopping a user from
 simply running a search through http://localhost:8983/solr/admin/ of the
 index server?
 
 
 Norberto Meijome-6 wrote:
  
  On Wed, 24 Jun 2009 23:20:26 -0700 (PDT)
  pof wrote:
  
  
  Hi, I am wanting to add document-level security that works as following:
  An
  external process makes a query to the index, depending on their security
  allowences based of a login id a list of hits are returned minus any the
  user are meant to know even exist. I was thinking maybe a custom filter
  with
  a JDBC connection to check security of the user vs. the document. I'm not
  sure how I would add the filter or how to write the filter or how to get
  the
  login id from a GET parameter. Any suggestions, comments etc.?
  
  Hi Brett,
  (keeping in mind that i've been away from SOLR for 8 months, but i
  dont think this was added of late)
  
  standard approach is to manage security @ your
  application layer, not @ SOLR. ie, search, return documents (which should
  contain some kind of data to identify their ACL ) and then you can decide
  whether to show it or not. 
  
  HIH
  _
  {Beto|Norberto|Numard} Meijome
  
  They never open their mouths without subtracting from the sum of human
  knowledge. Thomas Brackett Reed
  
  I speak for myself, not my employer. Contents may be hot. Slippery when
  wet.
  Reading disclaimers makes you go blind. Writing them is worse. You have
  been
  Warned.
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Solr-document-security-tp24197620p24212752.html
 Sent from the Solr - User mailing list archive at Nabble.com.