Re: MorphlineSolrSink

2013-07-15 Thread Israel Ekpo
Rajesh,

I think this question is better suited for the FLUME user mailing list.

You will need to configure the sink with the expected values so that the
events from the channels can head to the right place.

On Mon, Jul 15, 2013 at 4:49 PM, Rajesh Jain rjai...@gmail.com wrote:

 Newbie question:

 I have a Flume server, where I am writing to sink which is a RollingFile
 Sink.

 I have to take this files from the sink and send it to Solr which can index
 and provide search.

 Do I need to configure MorphineSolrSink?

 What is the mechanism's to do this or send this data over to Solr.

 Thanks,
 Rajesh




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: [ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released

2011-06-23 Thread Israel Ekpo
I am working on that, I hope to have an answer within a month or so.

On Tue, Jun 21, 2011 at 9:51 AM, roySolr royrutten1...@gmail.com wrote:

 Are you working on some changes to support earlier versions of PHP?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ANNOUNCEMENT-PHP-Solr-Extension-1-0-1-Stable-Has-Been-Released-tp3024040p3090702.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: [ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released

2011-06-11 Thread Israel Ekpo
it looks like you have to upgrade to php 5.3.x

Unfortunately, it looks like that method signature was different in that
version of PHP.

I would have to make additional changes to support the earlier versions of
PHP

On Tue, Jun 7, 2011 at 9:05 AM, roySolr royrutten1...@gmail.com wrote:

 Hello,

 I have some problems with the installation of the new PECL package
 solr-1.0.1.

 I run this command:

 pecl uninstall solr-beta ( to uninstall old version, 0.9.11)
 pecl install solr

 The installing is running but then it gives the following error message:

 /tmp/tmpKUExET/solr-1.0.1/solr_functions_helpers.c: In function
 'solr_json_to_php_native':
 /tmp/tmpKUExET/solr-1.0.1/solr_functions_helpers.c:1123: error: too many
 arguments to function 'php_json_decode'
 make: *** [solr_functions_helpers.lo] Error 1
 ERROR: `make' failed

 I have php version 5.2.17.

 How can i fix this?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ANNOUNCEMENT-PHP-Solr-Extension-1-0-1-Stable-Has-Been-Released-tp3024040p3034350.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


[ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released

2011-06-04 Thread Israel Ekpo
The new PECL package solr-1.0.1 (stable) has been released at
http://pecl.php.net/.

Release notes
-
- Added support for json response writer in SolrClient
- Removed final bit from classes so that they can be mocked in unit tests
- Changed from beta to stable
- Included phpdoc stubs in source to enable autocomplete of Solr classes and
methods in IDE during development
- Lowered libxml2 version requirement to 2.6.16

Package Info
-
It effectively simplifies the process of interacting with Apache Solr using
PHP5 and it already comes with built-in readiness for the latest features
added in Solr 3.1. The extension has features such as built-in, serializable
query string builder objects which effectively simplifies the manipulation
of name-value pair request parameters across repeated requests. The response
from the Solr server is also automatically parsed into native php objects
whose properties can be accessed as array keys or object properties without
any additional configuration on the client-side. Its advanced HTTP client
reuses the same connection across multiple requests and provides built-in
support for connecting to Solr servers secured behind HTTP Authentication or
HTTP proxy servers. It is also able to connect to SSL-enabled containers.
Please consult the documentation for more details on features. Included in
the source code are phpdoc stubs that enable autocomplete of Solr classes
and methods in IDE during development in userland.

Related Links
-
Package home: http://pecl.php.net/package/solr
Changelog: http://pecl.php.net/package-changelog.php?package=solr
Download: http://pecl.php.net/get/solr-1.0.1.tgz

Authors
-
Israel Ekpo ie...@php.net (lead)

-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: PECL SOLR PHP extension, JSON output

2011-05-08 Thread Israel Ekpo
There are instructions here for Solr 1.4

https://issues.apache.org/jira/browse/SOLR-1967

I have not finished the version of the plugin that will allow you to use
phpnative in 3.1 yet

I will post them as soon as I can

I have not been working on the PECL extension for a while now but I am
planning to modify the source to include support for JSON response writer
soon.

Stay tuned.

On Thu, Apr 21, 2011 at 9:47 AM, Ralf Kraus r...@pixelhouse.de wrote:

 Am 21.04.2011 13:58, schrieb roySolr:

  I have tried that but it seems like JSON is not supported

 Parameters

 responseWriter

 One of the following :

 - xml
  - phpnative





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/PECL-SOLR-PHP-extension-JSON-output-tp2846092p2846728.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 And I can´t get phpnative working with SOLR 3.1 :-(

 --
 Greets,
 Ralf Kraus




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: phpnative response writer in SOLR 3.1 ?

2011-05-08 Thread Israel Ekpo
Sorry for the late response.

I am working on an updated version for the latest release of Solr and Lucene

I will post my changes soon within the week.

Thank you for your patience.

On Fri, Apr 15, 2011 at 3:20 AM, Ralf Kraus r...@pixelhouse.de wrote:

 Am 14.04.2011 09:53, schrieb Ralf Kraus:

  Hello,

 I just updatet to SOLR 3.1 and wondering if the phpnative response writer
 plugin is part of it?
 ( https://issues.apache.org/jira/browse/SOLR-1967 )

 When I try to compile the sources files I get some errors :

 PHPNativeResponseWriter.java:57:
 org.apache.solr.request.PHPNativeResponseWriter is not abstract and does not
 override abstract method
 getContentType(org.apache.solr.request.SolrQueryRequest,org.apache.solr.response.SolrQueryResponse)
 in org.apache.solr.response.QueryResponseWriter
 public class PHPNativeResponseWriter implements QueryResponseWriter {
   ^
 PHPNativeResponseWriter.java:70: method does not override a method from
 its superclass
@Override
 ^

 Is there a new JAR File or something I could use with SOLR 3.1? Because
 the SOLR pecl Package only uses XML oder PHPNATIVE as response writer (
 http://pecl.php.net/package/solr )


 No hints at all ?

 --
 Greetings,
 Ralf Kraus




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Solr Php Client

2011-04-09 Thread Israel Ekpo
Cool.

I will take a look at the issue later tomorrow.

On Fri, Apr 8, 2011 at 2:28 AM, Haspadar haspa...@gmail.com wrote:

 I'm entering only a query parameter.
 I posted a bug description there -
 http://pecl.php.net/bugs/bug.php?id=22634


 2011/4/8 Israel Ekpo israele...@gmail.com

  Hi,
 
  Could you send the enter list of parameters you are ending to solr via
 the
  SolrClient and SolrQuery object?
 
  Please open a bug request here with the details
 
  http://pecl.php.net/bugs/report.php?package=solr
 
  On Thu, Apr 7, 2011 at 7:59 PM, Haspadar haspa...@gmail.com wrote:
 
   Hello
   I updated Solr to version 3.1 on my project. And now when the
 application
   calls getResponse () method (PECL extension) I get the following:
   Fatal error: Uncaught exception 'SolrException' with message 'Error
   un-serializing response' in /home/.../Adapter/Solr.php: 78
  
   How can I fix it?
  
   Thanks
  
 
 
 
  --
  °O°
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the gift.
  Quality First. Measure Twice. Cut Once.
  http://www.israelekpo.com/
 




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Solr Php Client

2011-04-07 Thread Israel Ekpo
Hi,

Could you send the enter list of parameters you are ending to solr via the
SolrClient and SolrQuery object?

Please open a bug request here with the details

http://pecl.php.net/bugs/report.php?package=solr

On Thu, Apr 7, 2011 at 7:59 PM, Haspadar haspa...@gmail.com wrote:

 Hello
 I updated Solr to version 3.1 on my project. And now when the application
 calls getResponse () method (PECL extension) I get the following:
 Fatal error: Uncaught exception 'SolrException' with message 'Error
 un-serializing response' in /home/.../Adapter/Solr.php: 78

 How can I fix it?

 Thanks




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: New PHP API for Solr (Logic Solr API)

2011-03-26 Thread Israel Ekpo
Lukas,

How do you think it should have been designed?

Most libraries are not going to have all the features that you need and
while there may be features about the library that you do not like others
may really appreciate them being there.

As I said earlier in an earlier email a couple of months ago, the
SolrQuery:set(), get() and add() methods do exist for you to use if you
prefer not to use the feature specific methods in the SolrQuery class, thats
the beauty of it.

The PECL extension was something I designed to use on a personal project and
it was really helpful in managing faceted search and other features that
solr has to offer. I decided to share it with the PHP community because I
felt others might need similar functionality. So it is possible that they
may have been use cases that applied to my project that may not be
applicable to yours

I initially used the SolrJ API to access Solr via Java and then when I had a
PHP project I decided to use something similar to SolrJ but at the time
there was nothing similar in the PHP realm

http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/package-summary.html

A review of the SolrJ API will offer more explanations on some of the
features present in the PECL API

I will really love to get feedback from others about the design of the PECL
library about any other missing or extraneous features

Thanks.

On Mon, Mar 7, 2011 at 4:04 AM, Lukas Kahwe Smith m...@pooteeweet.orgwrote:


 On 07.03.2011, at 09:43, Stefan Matheis wrote:

  Burak,
 
  what's wrong with the existing PHP-Extension
  (http://php.net/manual/en/book.solr.php)?


 the main issue i see with it is that the API isn't designed much. aka it
 just exposes lots of features with dedicated methods, but doesnt focus on
 keeping the API easy to overview (aka keep simple things simple and make
 complex stuff possible). at the same time fundamental stuff like quoting are
 not covered.

 that being said, i do not think we really need a proliferation of solr
 API's for PHP, even if this one is based on PHP 5.3 (namespaces etc). btw
 there is already another PHP 5.3 based API, though it tries to also unify
 other Lucene based API's as much as possible:
 https://github.com/dstendardi/Ariadne

 regards,
 Lukas Kahwe Smith
 m...@pooteeweet.org






-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: solr init.d script

2010-11-09 Thread Israel Ekpo
I think it would be a better idea to load solr via a servlet container like
Tomcat and then create the init.d script for tomcat instead.

http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6

On Tue, Nov 9, 2010 at 2:47 AM, Eric Martin e...@makethembite.com wrote:

 Er, what flavor?

 RHEL / CentOS

 #!/bin/sh

 # Starts, stops, and restarts Apache Solr.
 #
 # chkconfig: 35 92 08
 # description: Starts and stops Apache Solr

 SOLR_DIR=/var/solr
 JAVA_OPTIONS=-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=mustard -jar start.jar
 LOG_FILE=/var/log/solr.log
 JAVA=/usr/bin/java

 case $1 in
start)
echo Starting Solr
cd $SOLR_DIR
$JAVA $JAVA_OPTIONS 2 $LOG_FILE 
;;
stop)
echo Stopping Solr
cd $SOLR_DIR
$JAVA $JAVA_OPTIONS --stop
;;
restart)
$0 stop
sleep 1
$0 start
;;
*)
echo Usage: $0 {start|stop|restart} 2
exit 1
;;
 esac

 


 Debian

 http://xdeb.org/node/1213

 __

 Ubuntu

 STEPS
 Type in the following command in TERMINAL to install nano text editor.
 sudo apt-get install nano
 Type in the following command in TERMINAL to add a new script.
 sudo nano /etc/init.d/solr
 TERMINAL will display a new page title GNU nano 2.0.x.
 Paste the below script in this TERMINAL window.
 #!/bin/sh -e

 # Starts, stops, and restarts solr

 SOLR_DIR=/apache-solr-1.4.0/example
 JAVA_OPTIONS=-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=stopkey -jar start.jar
 LOG_FILE=/var/log/solr.log
 JAVA=/usr/bin/java

 case $1 in
start)
echo Starting Solr
cd $SOLR_DIR
$JAVA $JAVA_OPTIONS 2 $LOG_FILE 
;;
stop)
echo Stopping Solr
cd $SOLR_DIR
$JAVA $JAVA_OPTIONS --stop
;;
restart)
$0 stop
sleep 1
$0 start
;;
*)
echo Usage: $0 {start|stop|restart} 2
exit 1
;;
 esac
 Note: In above script you might have to replace /apache-solr-1.4.0/example
 with appropriate directory name.
 Press CTRL-X keys.
 Type in Y
 When ask File Name to Write press ENTER key.
 You're now back to TERMINAL command line.

 Type in the following command in TERMINAL to create all the links to the
 script.
 sudo update-rc.d solr defaults
 Type in the following command in TERMINAL to make the script executable.
 sudo chmod a+rx /etc/init.d/solr
 To test. Reboot your Ubuntu Server.
 Wait until Ubuntu Server reboot is completed.
 Wait 2 minutes for Apache Solr to startup.
 Using your internet browser go to your website and try a Solr search.



 -Original Message-
 From: Nikola Garafolic [mailto:nikola.garafo...@srce.hr]
 Sent: Monday, November 08, 2010 11:42 PM
 To: solr-user@lucene.apache.org
 Subject: solr init.d script

 Hi,

 Does anyone have some kind of init.d script for solr, that can start,
 stop and check solr status?

 --
 Nikola Garafolic
 SRCE, Sveucilisni racunski centar
 tel: +385 1 6165 804
 email: nikola.garafo...@srce.hr




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: solr init.d script

2010-11-09 Thread Israel Ekpo
Yes.

I recommend running Solr via a servlet container.

It is much easier to manage compared to running it by itself.

On Tue, Nov 9, 2010 at 10:03 AM, Nikola Garafolic
nikola.garafo...@srce.hrwrote:

 I  have two nodes running one jboss server each and using one (single) solr
 instance, thats how I run it for now.

 Do you recommend running jboss with solr via servlet? Two jboss run in
 load-balancing for high availability purpose.

 For now it seems to be ok.


 On 11/09/2010 03:17 PM, Israel Ekpo wrote:

 I think it would be a better idea to load solr via a servlet container
 like
 Tomcat and then create the init.d script for tomcat instead.

 http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6


 --
 Nikola Garafolic
 SRCE, Sveucilisni racunski centar
 tel: +385 1 6165 804
 email: nikola.garafo...@srce.hr




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


ZendCon 2010 - Slides on Building Intelligent Search Applications with Apache Solr and PHP 5

2010-11-03 Thread Israel Ekpo
Due to popular demand, the link to my slides @ ZendCon are now available
here in case anyone else is looking for it.

http://slidesha.re/bAXNF3

The sample code will be uploaded shortly.

Feedback is also appreciated

http://joind.in/2261

-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Implementing Search Suggestion on Solr

2010-10-27 Thread Israel Ekpo
I think you may want to configure the field type used for the spell check to
use the synonyms file/database.

That way synonyms are also processed during index time.

This could help.

On Wed, Oct 27, 2010 at 6:47 AM, Antonio Calo' anton.c...@gmail.com wrote:

 Hi

 If I understood, you will build a kind of dictionary or ontology or
 thesauru and you will use it if Solr query results are few. At query time
 (before or after) you will perform a query on this dictionary in order to
 retrieve the suggested word.

 If you  need to do this, you can try to cvreate a custom request handler
 where you can controll the querying process in a simple manner (
 http://wiki.apache.org/solr/SolrRequestHandler).

 With the custom request handler, you can add custom code to check query
 results before submitting query to solr or analizing the query before
 sending result to client. I never coded one, but I think this is a good
 starting point.

 Hope this can help you

 Antonio



 Il 27/10/2010 11.03, Pablo Recio ha scritto:

  Thanks, it's not what I'm looking for.

 Actually I need something like search Ubuntu and it will prompt Maybe
 you
 will like 'Debian' too or something like that. I'm not trying to do it
 automatically, manually will be ok.

 Anyway, is good article you shared, maybe I will implement it, thanks!

 2010/10/27 Jakub Godawajakub.god...@gmail.com

  I am a real rookie at solr, but try this:
 http://solr.pl/2010/10/18/solr-and-autocomplete-part-1/?lang=en

 2010/10/27 Pablo Reciopre...@yaco.es

  Hi,

 I don't want to be annoying, but I'm looking for a way to do that.

 I repeat the question: is there a way to implement Search Suggestion
 manually?

 Thanks in advance.
 Regards,

 2010/10/18 Pablo Recio Quijanopre...@yaco.es

  Hi!

 I'm trying to implement some kind of Search Suggestion on a search

 engine

 I

 have implemented. This search suggestions should not be automatically

 like

 the one described for the SpellCheckComponent [1]. I'm looking

 something

 like:

 SAS oppositions =  Public job offers for some-company

 So I will have to define it manually. I was thinking about synonyms [2]

 but

 I don't know if it's the proper way to do it, because semantically

 those

 terms are not synonyms.

 Any ideas or suggestions?

 Regards,

 [1] http://wiki.apache.org/solr/SpellCheckComponent
 [2]


 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory





-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Highlighting for non-stored fields

2010-10-26 Thread Israel Ekpo
Check out this link

http://wiki.apache.org/solr/FieldOptionsByUseCase

You need to store the field if you want to use the highlighting feature.

If you need to retrieve and display the highlighted snippets then the fields
definitely needs to be stored.

To use term offsets, it will be a good idea to enable the following
attributes for that field  termVectors termPositions termOffsets

The only issue here is that your storage costs will increase because of
these extra features.

Nevertheless, you definitely need to store the field if you need to retrieve
it for highlighting purposes.

On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com wrote:

 Hi,

 I've been looking thru the mailing archive for the past week and I haven't
 found any useful info regarding this issue.

 My requirement is to index a few terabytes worth of data to be searched.
 Due to the size of the data, I would like to index without storing but I
 would like to use the highlighting feature.  Is this even possible?  What
 are my options?

 I've read about termOffsets, payload that could possibly be used to do this
 but I have no idea how this could be done.

 Any pointers greatly appreciated.  Someone please point me in the right
 direction.

  I don't mind having to write some code or digging thru existing code to
 accomplish this task.

 Thanks,
 P.




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Documents are deleted when Solr is restarted

2010-10-26 Thread Israel Ekpo
The Solr home is the -Dsolr.solr.home Java System property

Also make sure that -Dsolr.data.dir is define for your data directory, if it
is not already defined in the solrconfig.xml file

On Tue, Oct 26, 2010 at 10:46 AM, Upayavira u...@odoko.co.uk wrote:

 You need to watch what you are setting your solr.home to. That is where
 your indexes are being written. Are they getting overwritten/lost
 somehow. Watch the files in that dir while doing a restart.

 That's a start at least.

 Upayavira

 On Tue, 26 Oct 2010 16:40 +0300, Mackram Raydan mack...@gmail.com
 wrote:
  Hey everyone,
 
  I apologize if this question is rudimentary but it is getting to me and
  I did not find anything reasonable about it online.
 
  So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the
  SolrTomcat wiki page to setup. The system works exactly the way I want
  it (proper search, highlighting, etc...). The problem however is when I
  restart my Tomcat server all the data in Solr (ie the index) is simply
  lost. The admin shows me the number of docs is 0 when it was before in
  the thousands.
 
  Can someone please help me understand why the above is happening and how
  can I workaround it if possible?
 
  Big thanks for any help you can send my way.
 
  Regards,
 
  Mackram
 




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Modelling Access Control

2010-10-25 Thread Israel Ekpo
On Mon, Oct 25, 2010 at 8:16 AM, Paul Carey paul.p.ca...@gmail.com wrote:

 Many thanks for all the responses. I now plan on benchmarking and
 validating both the filter query approach, and maintaining the ACL
 entirely outside of Solr. I'll decide from there.

 Paul



Great.

I am looking forward for some feedback on the benchmarks.
-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Modelling Access Control

2010-10-23 Thread Israel Ekpo
Hi Paul,

Regardless of how you implement it, I would recommend you use filter queries
for the permissions check rather than making it part of the main query.

On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com wrote:

 Hi

 My domain model is made of users that have access to projects which
 are composed of items. I'm hoping to use Solr and would like to make
 sure that searches only return results for items that users have
 access to.

 I've looked over some of the older posts on this mailing list about
 access control and saw a suggestion along the lines of
 acl:user_id AND (actual query).

 While this obviously works, there are a couple of niggles. Every item
 must have a list of valid user ids (typically less than 100 in my
 case). Every time a collaborator is added to or removed from a
 project, I need to update every item in that project. This will
 typically be fewer than 1000 items, so I guess is no big deal.

 I wondered if the following might be a reasonable alternative,
 assuming the number of projects to which a user has access is lower
 than a certain bound.
 (acl:project_id OR acl:project_id OR ... ) AND (actual query)

 When the numbers are small - e.g. each user has access to ~20 projects
 and each project has ~20 collaborators - is one approach preferable
 over another? And when outliers exist - e.g. a project with 2000
 collaborators, or a user with access to 2000 projects - is one
 approach more liable to fail than the other?

 Many thanks

 Paul




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Modelling Access Control

2010-10-23 Thread Israel Ekpo
Hi All,

I think using filter queries will be a good option to consider because of
the following reasons

* The filter query does not affect the score of the items in the result set.
If the ACL logic is part of the main query, it could influence the scores of
the items in the result set.

* Using a filter query could lead to better performance in complex queries
because the results from the query specified with fq are cached
independently from that of the main query. Since the result of a filter
query is cached, it will be used to filter the primary query result using
set intersection without having to fetch the ids of the documents from the
fq again a second time.

It think this will be useful because we could assume that the ACL portion in
the fq is relatively constant since the permissions for each user is not
something that is changing frequently.

http://wiki.apache.org/solr/FilterQueryGuidance


On Sat, Oct 23, 2010 at 2:58 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 why use filter queries?

 Wouldn't reducing the set headed into the filters by putting it in the main
 query be faster? (A question to learn, since I do NOT know :-)

 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make them
 yourself. from '
 http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Sat, 10/23/10, Israel Ekpo israele...@gmail.com wrote:

  From: Israel Ekpo israele...@gmail.com
  Subject: Re: Modelling Access Control
  To: solr-user@lucene.apache.org
  Date: Saturday, October 23, 2010, 7:01 AM
  Hi Paul,
 
  Regardless of how you implement it, I would recommend you
  use filter queries
  for the permissions check rather than making it part of the
  main query.
 
  On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com
  wrote:
 
   Hi
  
   My domain model is made of users that have access to
  projects which
   are composed of items. I'm hoping to use Solr and
  would like to make
   sure that searches only return results for items that
  users have
   access to.
  
   I've looked over some of the older posts on this
  mailing list about
   access control and saw a suggestion along the lines
  of
   acl:user_id AND (actual query).
  
   While this obviously works, there are a couple of
  niggles. Every item
   must have a list of valid user ids (typically less
  than 100 in my
   case). Every time a collaborator is added to or
  removed from a
   project, I need to update every item in that project.
  This will
   typically be fewer than 1000 items, so I guess is no
  big deal.
  
   I wondered if the following might be a reasonable
  alternative,
   assuming the number of projects to which a user has
  access is lower
   than a certain bound.
   (acl:project_id OR acl:project_id OR
  ... ) AND (actual query)
  
   When the numbers are small - e.g. each user has access
  to ~20 projects
   and each project has ~20 collaborators - is one
  approach preferable
   over another? And when outliers exist - e.g. a project
  with 2000
   collaborators, or a user with access to 2000 projects
  - is one
   approach more liable to fail than the other?
  
   Many thanks
  
   Paul
  
 
 
 
  --
  °O°
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the
  gift.
  Quality First. Measure Twice. Cut Once.
  http://www.israelekpo.com/
 




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Removing Common Web Page Header and Footer from All Content Fetched by Nutch

2010-10-19 Thread Israel Ekpo
Thanks Otis and Markus for your input.

I will check it out today.

On Tue, Oct 19, 2010 at 4:45 AM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Unfortunately, Nutch still uses Tika 0.7 in 1.2 and trunk. Nutch needs to
 be
 upgraded to Tika 0.8 (when it's released or just the current trunk). Also,
 the
 Boilerpipe API needs to be exposed through Nutch configuration, which
 extractor
 can be used, which parameters need to be set etc.

 Upgrading to Tika's trunk might be relatively easy but exposing Boilerpipe
 surely isn't.

 On Tuesday, October 19, 2010 06:47:43 am Otis Gospodnetic wrote:
  Hi Israel,
 
  You can use this: http://search-lucene.com/?q=boilerpipefc_project=Tika
  Not sure if it's built into Nutch, though...
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
 
  - Original Message 
 
   From: Israel Ekpo israele...@gmail.com
   To: solr-user@lucene.apache.org; u...@nutch.apache.org
   Sent: Mon, October 18, 2010 9:01:50 PM
   Subject: Removing Common Web Page Header and Footer from All Content
   Fetched by
  
  Nutch
  
   Hi All,
  
   I am indexing a web application with approximately 9500 distinct  URL
 and
   contents using Nutch and Solr.
  
   I use Nutch to fetch the urls,  links and the crawl the entire web
   application to extract all the content for  all pages.
  
   Then I run the solrindex command to send the content to  Solr.
  
   The problem that I have now is that the first 1000 or so characters  of
   some pages and the last 400 characters of the pages are showing up in
   the  search results.
  
   These are contents of the common header and footer  used in the site
   respectively.
  
   The only work around that I have now is  to index everything and then
 go
   through each document one at a time to remove  the first 1000
 characters
   if the levenshtein distance between the first 1000  characters of the
   page and the common header is less than a certain value.  Same applies
   to the footer content common to all pages.
  
   Is there a way  to ignore certain stop phrase so to speak in the
 Nutch
   configuration based  on levenshtein distance or jaro winkler distance
 so
   that certain parts of the  fetched data that matches this stop phrases
   will not be parsed?
  
   Any  useful pointers would be highly appreciated.
  
   Thanks in  advance.

 --
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536600 / 06-50258350




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?

2010-10-19 Thread Israel Ekpo
Hi All,

Just wanted to post an update on where we stand with all the requests for
new features


List of Features Requested In SOLR PECL Extension

1. Ability to Send Custom Requests to Custom URLS other than select, update,
terms etc.
2. Ability to add files (pdf, office documents etc)
3. Windows version of latest releases.
4. Ensuring that SolrQuery::getFields(), SolrQuery::getFacets() et al
returns an array consistently.
5. Lowering Libxml version to 2.6.16

If there is anything that you think I left out please let me know. This is a
summary.

On Wed, Oct 13, 2010 at 3:48 AM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 On Tue, Oct 12, 2010 at 6:29 PM, Israel Ekpo israele...@gmail.com wrote:

  I think this feature will take care of this.
 
  What do you think?


 sounds good!




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Commits on service after shutdown

2010-10-18 Thread Israel Ekpo
The documents should be implicitly committed when the Lucene index is
closed.

When you perform a graceful shutdown, the Lucene index gets closed and the
documents get committed implicitly.

When the shutdown is abrupt as in a KILL -9, then this does not happen and
the updates are lost.

You can use the auto commit parameter when sending your updates so that the
changes are saved right away, thought this could slow down the indexing
speed considerably but I do not believe there are parameters to keep those
un-commited documents alive after a kill.



On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderara ezech...@gmail.comwrote:

  Hi, i'm new in the mailing list.
 I'm implementing Solr in my actual job, and i'm having some problems.
 I was testing the consistency of the commits. I found for example that if
 we add X documents to the index (without commiting) and then we restart the
 service, the documents are commited. They show up in the results. This is
 interpreted to me like an error.
 But when we add X documents to the index (without commiting) and then we
 kill the process and we start it again, the documents doesn't appear. This
 behaviour is the one i want.

 Is there any param to avoid the auto-committing of documents after a
 shutdown?
 Is there any param to keep those un-commited documents alive after a
 kill?

 Thanks!

 --
 __
 Ezequiel.

 Http://www.ironicnet.com http://www.ironicnet.com/




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Removing Common Web Page Header and Footer from All Content Fetched by Nutch

2010-10-18 Thread Israel Ekpo
Hi All,

I am indexing a web application with approximately 9500 distinct URL and
contents using Nutch and Solr.

I use Nutch to fetch the urls, links and the crawl the entire web
application to extract all the content for all pages.

Then I run the solrindex command to send the content to Solr.

The problem that I have now is that the first 1000 or so characters of some
pages and the last 400 characters of the pages are showing up in the search
results.

These are contents of the common header and footer used in the site
respectively.

The only work around that I have now is to index everything and then go
through each document one at a time to remove the first 1000 characters if
the levenshtein distance between the first 1000 characters of the page and
the common header is less than a certain value. Same applies to the footer
content common to all pages.

Is there a way to ignore certain stop phrase so to speak in the Nutch
configuration based on levenshtein distance or jaro winkler distance so that
certain parts of the fetched data that matches this stop phrases will not be
parsed?

Any useful pointers would be highly appreciated.

Thanks in advance.


-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Setting solr home directory in websphere

2010-10-18 Thread Israel Ekpo
You need to make sure that the following system variable is one of the
values specific in the JAVA_OPTS environment variable

-Dsolr.solr.home=path_to_solr_home



On Mon, Oct 18, 2010 at 10:20 PM, Kevin Cunningham 
kcunning...@telligent.com wrote:

 I've installed Solr a hundred times using Tomcat (on Windows) but now need
 to get it going with WebSphere (on Windows).  For whatever reason this seems
 to be black magic :)  I've installed the war file but have no idea how to
 set Solr home to let WebSphere know where the index and config files are.
  Can someone enlighten me on how to do this please?




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Term is duplicated when updating a document

2010-10-15 Thread Israel Ekpo
Which fields are modified when the document is updated/replaced.

Are there any differences in the content of the fields that you are using
for the AutoSuggest.

Have you changed you schema.xml file recently? If you have, then there may
have been changes in the way these fields are analyzed and broken down to
terms.

This may be a bug if you did not change the field or the schema file but the
terms count is changing.

On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer spam_ea...@gmx.net wrote:

 Hi,

 we are updating our documents (that represent products in our shop) when a
 dealer modifies them, by calling
 SolrServer.add(SolrInputDocument) with the updated document.

 My understanding is, that there is no other way of updating an existing
 document.


 However we also use a term query to autocomplete the search field for the
 user, but each time adocument is updated (added) the term count is
 incremented. So after starting with a new index the count is e.g. 1, then
 the document (that contains that term) is updated, and the count is 2, the
 next update will set this to 3 and so on.

 One the index is optimized (by calling SolServer.optimize()) the count is
 correct again.

 Am I missing something or is this a bug in Solr/Lucene?

 Thanks in advance
 Thomas




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?

2010-10-12 Thread Israel Ekpo
On Mon, Oct 11, 2010 at 3:33 AM, Lukas Kahwe Smith m...@pooteeweet.orgwrote:


 On 11.10.2010, at 07:03, Israel Ekpo wrote:

  I am currently working on a couple of bug fixes for the Solr PECL
 extension
  that will be available in the next release 0.9.12 sometime this month.
 
  http://pecl.php.net/package/solr
 
  Documentation of the current API and features for the PECL extension is
  available here
 
  http://www.php.net/solr
 
  A couple of users in the community were asking when the PHP extension
 will
  be moving from beta to stable.
 
  The API looks stable so far with no serious issues and I am looking to
  moving it from *Beta* to *Stable *on November 20 2010
 
  If you are using Solr via PHP and would like to see any new features in
 the
  extension please feel free to send me a note.
 
  I would like to incorporate those changes in 0.9.12 so that user can try
  them out and send me some feedback before the release of version 1.0
 
  Thanks in advance for your response.


 we already had some emails about this.
 imho there are too many methods for specialized tasks, that its easy to get
 lost in the API, especially since not all of them have written documentation
 yet beyond the method signatures.

 also i do think that there should be methods for escaping and also
 tokenizing lucene queries to enable validation of the syntax used etc.

 see here for a use case and a user land implementation:
 http://pooteeweet.org/blog/1796

 regards,
 Lukas Kahwe Smith
 m...@pooteeweet.org




Thanks Lukas for your feed back.

Could you clarify the part about too many methods for specialized task? From
feedback that I have received so far, most users like the specialization and
and a small fraction do not. So it might be a matter of preference. I
decided to add the specialized methods in the SolrQuery class because at the
time, that was what most of the users wanted to see in the API. They cannot
be removed now.

As per the documentation, all of the methods are documented with at least a
brief heading or summary of what it is supposed to do.

http://php.net/solr

The user needs to understand first which query parameters they need to send
to Solr and then they can use one of the SolrQuery methods for that purpose.
Additional information is available from Solr Tutorials and the wiki
itself.  If one choses not to use a specialized method there is always the
get(), set() and add() methods that allows you to pass the parameter values
directly instead of using a specialized method for that parameter.

For escaping queries, we already have the following method

SolrUtils::escapeQueryChars
http://www.php.net/manual/en/solrutils.escapequerychars.php
http://www.php.net/manual/en/class.solrutils.php

As per the tokenization, it is not clear exactly what you were referring to.
I think it is best for the analysis of any of the tokens to be handled at
the server layer. There are tools in the admin interface for analyzing and
breaking down the query components into tokens.

I also took a look at your blog but I could not immediately the use case you
were referring to. A little more detail on this will be helpful.

Thanks Lukas for your input.

-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?

2010-10-12 Thread Israel Ekpo
On Tue, Oct 12, 2010 at 7:42 AM, Peter Blokland pe...@desk.nl wrote:

 hi,

 On Mon, Oct 11, 2010 at 01:03:07AM -0400, Israel Ekpo wrote:

  If you are using Solr via PHP and would like to see any new features in
 the
  extension please feel free to send me a note.

 I'm currently testing a setup with Solr via PHP, and was wondering if
 support for the ExtractingRequestHandler is planned ? It may be that I
 missed something in the documentation, but for now it looks like I need
 to build my own POST's to the /solr/update/extract handler.

 --
 CUL8R, Peter.

 www.desk.nl --- Sent from my NetBSD-powered Talkie Toaster™


Peter,

That is an excellent idea.

I will add that to the wishlist.

-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?

2010-10-12 Thread Israel Ekpo
On Tue, Oct 12, 2010 at 8:43 AM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 Hi Isreal,

 On Mon, Oct 11, 2010 at 7:03 AM, Israel Ekpo israele...@gmail.com wrote:

  If you are using Solr via PHP and would like to see any new features in
 the
  extension please feel free to send me a note.


 we actually tried to grab some informations from solr's dataimport-page,
 but
 therefore we had to generate the complete url manually. which means, we
 have
 to access the solr-object to get hostname, port, etc and construct the
 needed url ourself.

 perhaps it's an idea to implement something like $solr-executeHttpRequest(
 'GET', 'dataimport', array( 'command' = 'status' ) which could easily
 reuse
 all given informations and also for example the existing proxy handling.

 Regards
 Stefan


Stefan,

I agree with you. Excellent idea.

I am currently working on a feature that will allow you to specify the
target path (url) and then able to send any parameters or xml request to the
server.

I think this feature will take care of this.

What do you think?

-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?

2010-10-12 Thread Israel Ekpo
On Tue, Oct 12, 2010 at 12:44 PM, Ken Stanley doh...@gmail.com wrote:

 
If you are using Solr via PHP and would like to see any new features
 in
   the
extension please feel free to send me a note.
 

 I'm new to this list, but in seeing this thread - and using PHP SOLR - I
 wanted to make a suggestion that - while minor - I think would greatly
 improve the quality of the extension.

 (I'm basing this mostly off of SolrQuery since that's where I've
 encountered
 the issue, but this might be true elsewhere)

 Whenever a method is supposed to return an array (i.e.,
 SolrQuery::getFields(), SolrQuery::getFacets(), etc), if there is no data
 to
 return, a null is returned. I think that this should be normalized across
 the board to return an empty array. First, the documentation is
 contradictory (http://us.php.net/manual/en/solrquery.getfields.php) in
 that
 the method signature says that it returns an array (not mixed), while the
 Return Values section says that it returns either an array or null.
 Secondly, returning an array under any circumstance provides more
 consistency and less logic; for example, let's say that I am looking for
 the
 fields (as-is in its current state):

 ?php
 // .. assume a proper set up
 if ($solrquery-getFields() !== null) {
foreach ($solrquery-getFields() as $field) {
// Do something
}
 }
 ?

 This is a minor request, I know. But, I feel that it would go a long way
 toward polishing the extension up for general consumption.

 Thank you,

 Ken Stanley

 PS. I apologize if this request has come through the pipes already; as I've
 stated, I am new to this list; I have yet to find any reference to my
 request. :)



Great recommendation Ken.

Thanks for catching that! That should be a quick one.

-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?

2010-10-10 Thread Israel Ekpo
Hi All,

I am currently working on a couple of bug fixes for the Solr PECL extension
that will be available in the next release 0.9.12 sometime this month.

http://pecl.php.net/package/solr

Documentation of the current API and features for the PECL extension is
available here

http://www.php.net/solr

A couple of users in the community were asking when the PHP extension will
be moving from beta to stable.

The API looks stable so far with no serious issues and I am looking to
moving it from *Beta* to *Stable *on November 20 2010

If you are using Solr via PHP and would like to see any new features in the
extension please feel free to send me a note.

I would like to incorporate those changes in 0.9.12 so that user can try
them out and send me some feedback before the release of version 1.0

Thanks in advance for your response.

-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: [PECL-DEV] Re: PHP Solr API

2010-10-01 Thread Israel Ekpo
Scott,

You can also use the SolrClient::setServlet() method with
SolrClient::TERMS_SERVLET_TYPE as the type

http://www.php.net/manual/en/solrclient.setservlet.php



On Fri, Oct 1, 2010 at 12:57 AM, Scott Yeadon scott.yea...@anu.edu.auwrote:

  Hi,

 Sorry, scrap that, just found that SolrQuery is a subclass of
 ModifiableParams so can do this via add method and seems to work ok.

 Apologies for the noise.

 Scott.


 On 1/10/10 2:35 PM, Scott Yeadon wrote:

  Hi,

 Just wondering if there is a way of setting the qt parameter in the Solr
 PHP API. I want to use the Term Vector Component but not sure this is
 supported in the API?

 Thanks

 Scott.



 --
 PECL development discussion Mailing List (http://pecl.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Null Pointer Exception while indexing

2010-09-16 Thread Israel Ekpo
Try removing the data directory and then restart your Servlet container and
see if that helps.

On Thu, Sep 16, 2010 at 3:28 AM, Lance Norskog goks...@gmail.com wrote:

 Which version of Solr? 1.4?, 1.4.1? 3.x branch? trunk? if the 3.x or the
 trunk, when did you pull it?


 andrewdps wrote:

 What could be possible error for

 14-Sep-10 4:28:47 PM org.apache.solr.common.SolrException log
 SEVERE: java.util.concurrent.ExecutionException:
 java.lang.NullPointerException
at java.util.concurrent.FutureTask$Sync.innerGet(libgcj.so.90)
at java.util.concurrent.FutureTask.get(libgcj.so.90)
at

 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:439)
at

 org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run(DirectUpdateHandler2.java:602)
at java.util.concurrent.Executors$RunnableAdapter.call(libgcj.so.90)
at java.util.concurrent.FutureTask$Sync.innerRun(libgcj.so.90)
at java.util.concurrent.FutureTask.run(libgcj.so.90)
at

 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$2(libgcj.so.90)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(libgcj.so.90)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(libgcj.so.90)
at java.lang.Thread.run(libgcj.so.90)
 Caused by: java.lang.NullPointerException
at
 org.apache.solr.search.FastLRUCache.getStatistics(FastLRUCache.java:252)
at org.apache.solr.search.FastLRUCache.toString(FastLRUCache.java:280)
at java.lang.StringBuilder.append(libgcj.so.90)
at
 org.apache.solr.search.SolrIndexSearcher.close(SolrIndexSearcher.java:223)
at org.apache.solr.core.SolrCore$6.close(SolrCore.java:1246)
at org.apache.solr.util.RefCounted.decref(RefCounted.java:57)
at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1192)
at java.util.concurrent.FutureTask$Sync.innerRun(libgcj.so.90)
at java.util.concurrent.FutureTask.run(libgcj.so.90)
...3 more

 I get this error(after indexing a few records I get the above error and
 again starts indexing.i get the same error after indexing few hundred
 records) when I try to index the marc record on the server.I worked fine
 on
 the local system.

 Thanks





-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: ANNOUNCE: Stump Hoss @ Lucene Revolution

2010-08-23 Thread Israel Ekpo
Chris,

I have a couple of questions I would like to through your way.

Is there a place where one can sign up for this.

Is sounds very interesting.

On Mon, Aug 23, 2010 at 4:49 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 Hey everybody,

 As you (hopefully) have heard by now, Lucid Imagination is sponsoring a
 Lucene/Solr conference in Boston about 6 weeks from now.  We've got a lot of
 really great speakers lined up to give some really interesting technical
 talks, so I offered to do something a little bit different.

 I'm going to be in the hot seat for a Stump The Chump style session,
 where I'll be answering Solr questions live and unrehearsed...

http://bit.ly/stump-hoss

 The goal is to really make me sweat and work hard to think of creative
 solutions to non-trivial problems on the spot -- like when I answer
 questions on the solr-user mailing list, except in a crowded room with
 hundreds of people staring at me and laughing.

 But in order to be a success, we need your questions/problems/challenges!

 If you had a tough situation with Solr that you managed to solve with a
 creative solution (or haven't solved yet) and are interesting to see what
 type of solution I might come up with under pressure, please email a
 description of your problem to st...@lucenerevolution.org -- More details
 online...

 http://lucenerevolution.org/Presentation-Abstracts-Day1#stump-hostetter

 Even if you won't be able to make it to Boston, please send in any
 challenging problems you would be interested to see me tackle under the gun.
  The session will be recorded, and the video will be posted online shortly
 after the conference has ended.  And if you can make it to Boston: all the
 more fun to watch live and in person (and maybe answer follow up questions)

 In any case, it should be a very interesting session: folks will either get
 to learn a lot, or laugh at me a lot, or both.  (win/win/win)


 -Hoss

 --
 http://lucenerevolution.org/  ...  October 7-8, Boston
 http://bit.ly/stump-hoss  ...  Stump The Chump!




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Disk usage per-field

2010-07-03 Thread Israel Ekpo
Currently, this feature is not available.

The amount of space a field consumes varies and depends on whether the field
is index only, stored only or indexed and stored.

It also depends on how the field is analyzed

On Fri, Jul 2, 2010 at 2:59 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/30/2010 5:44 PM, Shawn Heisey wrote:

 Is it possible for Solr (or Luke/Lucene) to tell me exactly how much of
 the total index disk space is used by each field?  It would also be very
 nice to know, for each field, how much is used by the index and how much is
 used for stored data.


 Still interested in this.  It would be perfectly OK if such a thing were
 completely external to Solr and required a good chunk of time to calculate.
  I would not need to do it very often.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


[NEWS] New Response Writer for Native PHP Solr Client

2010-06-22 Thread Israel Ekpo
Hi Solr users,

If you are using Apache Solr via PHP, I have some good news for you.

There is a new response writer for the PHP native extension, currently
available as a plugin.

This new feature adds a new response writer class to the
org.apache.solr.request package.

This class is used by the PHP Native Solr Client driver to prepare the query
response from Solr.

This response writer allows you to configure the way the data is serialized
for the PHP client.

You can use your own class name and you can also control how the properties
are serialized as well.

The formatting of the response data is very similar to the way it is
currently done by the PECL extension on the client side.

The only difference now is that this serialization is happening on the
server side instead.

You will find this new response writer particularly useful when dealing with
responses for

- highlighting
- admin threads responses
- more like this responses

to mention just a few

You can pass the objectClassName request parameter to specify the class
name to be used for serializing objects.

Please note that the class must be available on the client side to avoid a
PHP_Incomplete_Object error during the unserialization process.

You can also pass in the objectPropertiesStorageMode request parameter
with either a 0 (independent properties) or a 1 (combined properties).

These parameters can also be passed as a named list when loading the
response writer in the solrconfig.xml file

Having this control allows you to create custom objects which gives the
flexibility of implementing custom __get methods, ArrayAccess, Traversable
and Iterator interfaces on the PHP client side.

Until this class in incorporated into Solr, you simply have to copy the jar
file containing this plugin into your lib directory under $SOLR_HOME

The jar file is available here

https://issues.apache.org/jira/browse/SOLR-1967

Then set up the configuration as shown below and then restart your servlet
container

Below is an example configuration in solrconfig.xml

code
queryResponseWriter name=phpnative
class=org.apache.solr.request.PHPNativeResponseWriter
!-- You can choose a different class for your objects. Just make sure the
class is available in the client --
str name=objectClassNameSolrObject/str
!--
0 means OBJECT_PROPERTIES_STORAGE_MODE_INDEPENDENT
1 means OBJECT_PROPERTIES_STORAGE_MODE_COMBINED

In independed mode, each property is a separate property
In combined mode, all the properites are merged into a _properties array.
The combined mode allows you to create custom __getters and you could also
implement ArrayAccess, Iterator and Traversable
--
int name=objectPropertiesStorageMode0/int
/queryResponseWriter

code

Below is an example implementation on the PHP client side.

Support for specifying custom response writers will be available starting
from the 0.9.11 version (released today) of the PECL extension for Solr
currently available here

http://pecl.php.net/package/solr

Here is an example of how to use the new response writer with the PHP
client.

code
?php

class SolrClass
{
public $_properties = array();

public function __get($property_name) {

if (property_exists($this, $property_name)) { return $this-$property_name;
} else if (isset($_properties[$property_name])) { return
$_properties[$property_name]; }

return null;
}
}

$options = array
(
'hostname' = 'localhost',
'port' = 8983,
'path' = '/solr/'
);

$client = new SolrClient($options);

$client-setResponseWriter(phpnative);

$response = $client-ping();

$query = new SolrQuery();

$query-setQuery(:);

$query-set(objectClassName, SolrClass);
$query-set(objectPropertiesStorageMode, 1);

$response = $client-query($query);

$resp = $response-getResponse();

?
code

Documentation of the changes to the PECL extension are available here

http://docs.php.net/manual/en/solrclient.construct.php
http://docs.php.net/manual/en/solrclient.setresponsewriter.php

Please contact me at ie...@php.net, if you have any questions or comments.

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


[PECL-DEV] [ANNOUNCEMENT] solr-0.9.11 (beta) Released

2010-06-21 Thread Israel Ekpo
The new PECL package solr-0.9.11 (beta) has been released at
http://pecl.php.net/.

Release notes
-
- Added ability to specify response writer in constructor option (wt)
- Added new method to set response writer SolrClient::setResponseWriter()
- Currently, the only supported response writers are 'xml' and 'phpnative'
- Added support for new native Solr response writer
- New response writer is available at
https://issues.apache.org/jira/browse/SOLR-1967

Package Info
-
It effectively simplifies the process of interacting with Apache Solr using
PHP5 and it already comes with built-in readiness for the latest features
added in Solr 1.4. The extension has features such as built-in, serializable
query string builder objects which effectively simplifies the manipulation
of name-value pair request parameters across repeated requests. The response
from the Solr server is also automatically parsed into native php objects
whose properties can be accessed as array keys or object properties without
any additional configuration on the client-side. Its advanced HTTP client
reuses the same connection across multiple requests and provides built-in
support for connecting to Solr servers secured behind HTTP Authentication or
HTTP proxy servers. It is also able to connect to SSL-enabled containers.
Please consult the documentation for more details on features.

Related Links
-
Package home: http://pecl.php.net/package/solr
Changelog: http://pecl.php.net/package-changelog.php?package=solr
Download: http://pecl.php.net/get/solr-0.9.11.tgz
Documentation http://docs.php.net/

Authors
-
Israel Ekpo ie...@php.net (lead)

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: SOLR search performance - Linux vs Windows servers

2010-06-16 Thread Israel Ekpo
Thats a good note.

I get this kind of question a lot.

Most of the time, the reason is because there are database servers (MySQL)
and Webservers (Apache) and other processes running on the Linux box.

Try to verify that the load, number of processors/cores as well as other
environment settings are similar before drawing a conclusion.

On Wed, Jun 16, 2010 at 5:43 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 BB,

 Could it be that you are comparing apples and oranges?
 * Is the hardware identical?
 * Are indices identical?
 * Are JVM versions the same?
 * Are JVM arguments identical?
 * Are the two boxes equally idle when Solr is not running?

 * etc.

 In general, no, there is no reason why Windows would automatically be
 faster than Linux.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: bbarani bbar...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Wed, June 16, 2010 5:06:55 PM
  Subject: SOLR search performance - Linux vs Windows servers
 
 
 Hi,

 I have SOLR instances running in both Linux / windows server
  (same version /
 same index data). Search  performance is good in windows
  box compared to
 Linux box.

 Some queries takes more than 10 seconds in
  Linux box but takes just a second
 in windows box. Have anyone encountered
  this kind of issue before?

 Thanks,
 BB
 --
 View this message in
  context:
  href=
 http://lucene.472066.n3.nabble.com/SOLR-search-performance-Linux-vs-Windows-servers-tp901069p901069.html
 
  target=_blank
   
 http://lucene.472066.n3.nabble.com/SOLR-search-performance-Linux-vs-Windows-servers-tp901069p901069.html
 Sent
  from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Need help with document format

2010-06-07 Thread Israel Ekpo
I think you need a 1:1 mapping between the consultant and the company, else
how are you going to run your queries for let's say consultants that worked
for Google or AOL between March 1999 and August 2004?

If the mapping is 1:1, your life would be easier and you would not need to
do extra parsing of the results your retrieved.

Unfortunately, it looks like your are doing to have a lot of records.

With an RDBMS, it is easier to do joins but with Lucene and Solr you have to
denormalize all the relationships.

Hence in this particular scenario, if you have 5 consultants that worked for
4 distinct companies you will have to send 20 documents to Solr

On Mon, Jun 7, 2010 at 10:15 AM, Moazzam Khan moazz...@gmail.com wrote:

 Thanks for the replies guys.


 I am currently storing consultants like this ..

 doc
  id123/id
  FirstNametony/FirstName
  LastNamemarjo/LastName
  CompanyGoogle/Company
  CompanyAOL/Company
 doc

 I have a few multi valued fields so if I do it the way Israel
 suggested it, I will have tons of records. Do you think it will be
 better if I did this instead ?


 doc
  id123/id
  FirstNametony/FirstName
  LastNamemarjo/LastName
  CompanyGoogle_StartDate_EndDate/Company
  CompanyAOL_StartDate_EndDate/Company
 doc

 Or is what you guys said better?

 Thanks for all the help.

 Moazzam


 On Mon, Jun 7, 2010 at 1:10 AM, Lance Norskog goks...@gmail.com wrote:
  And for 'present', you would pick some time far in the future:
  2100-01-01T00:00:00Z
 
  On 6/5/10, Israel Ekpo israele...@gmail.com wrote:
  You need to make each document added to the index a 1 to 1 mapping for
 each
  company and consultant combo
 
  schema
 
  fields
  !-- Concatenation of company and consultant id --
  field name=consultant_id_company_id type=string indexed=true
  stored=true required=true/
  field name=consultant_firstname type=string indexed=true
  stored=true multiValued=false/
  field name=consultant_lastname type=string indexed=true
  stored=true multiValued=false/
 
  !-- The name of the company the consultant worked for --
  field name=company type=text indexed=true stored=true
  multiValued=false/
  field name=start_date type=tdate indexed=true stored=true
  multiValued=false/
  field name=end_date type=tdate indexed=true stored=true
  multiValued=false/
  /fields
 
  defaultSearchFieldtext/defaultSearchField
 
  copyField source=consultant_firstname dest=text/
  copyField source=consultant_lastname dest=text/
  copyField source=company dest=text/
 
  /schema
 
  !--
 
  So for instance, you have 2 consultants
 
  Michael Davis and Tom Anderson who worked for AOL and Microsoft, Yahoo,
  Google and Facebook.
 
  Michael Davis = 1
  Tom Anderson = 2
 
  AOL = 1
  Microsoft = 2
  Yahoo = 3
  Google = 4
  Facebook = 5
 
  This is how you would add the documents to the index
 
  --
 
  doc
  consultant_id_company_id1_1/consultant_id_company_id
  consultant_firstnameMichael/consultant_firstname
  consultant_lastnameDavis/consultant_lastname
  companyAOL/company
  start_date2006-02-13T15:26:37Z/start_date
  end_date2008-02-13T15:26:37Z/end_date
  /doc
 
  doc
  consultant_id_company_id1_4/consultant_id_company_id
  consultant_firstnameMichael/consultant_firstname
  consultant_lastnameDavis/consultant_lastname
  companyGoogle/company
  start_date2006-02-13T15:26:37Z/start_date
  end_date2009-02-13T15:26:37Z/end_date
  /doc
 
  doc
  consultant_id_company_id2_3/consultant_id_company_id
  consultant_firstnameTom/consultant_firstname
  consultant_lastnameAnderson/consultant_lastname
  companyYahoo/company
  start_date2001-01-13T15:26:37Z/start_date
  end_date2009-02-13T15:26:37Z/end_date
  /doc
 
  doc
  consultant_id_company_id2_4/consultant_id_company_id
  consultant_firstnameTom/consultant_firstname
  consultant_lastnameAnderson/consultant_lastname
  companyGoogle/company
  start_date1999-02-13T15:26:37Z/start_date
  end_date2010-02-13T15:26:37Z/end_date
  /doc
 
 
  The you can search as
 
  q=company:X AND start_date:[X TO *] AND end_date:[* TO Z]
 
  On Fri, Jun 4, 2010 at 4:58 PM, Moazzam Khan moazz...@gmail.com
 wrote:
 
  Hi guys,
 
 
  I have a list of consultants and the users (people who work for the
  company) are supposed to be able to search for consultants based on
  the time frame they worked for, for a company. For example, I should
  be able to search for all consultants who worked for Bear Stearns in
  the month of july. What is the best of accomplishing this?
 
  I was thinking of formatting the document like this
 
  company
name Bear Stearns/name
startDate2000-01-01/startDate
endDatepresent/endDate
  /company
  company
name AIG/name
startDate1999-01-01/startDate
endDate2000-01-01/endDate
  /company
 
  Is this possible?
 
  Thanks,
 
  Moazzam
 
 
 
 
  --
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice

Re: Need help with document format

2010-06-05 Thread Israel Ekpo
You need to make each document added to the index a 1 to 1 mapping for each
company and consultant combo

schema

fields
!-- Concatenation of company and consultant id --
field name=consultant_id_company_id type=string indexed=true
stored=true required=true/
field name=consultant_firstname type=string indexed=true
stored=true multiValued=false/
field name=consultant_lastname type=string indexed=true
stored=true multiValued=false/

!-- The name of the company the consultant worked for --
field name=company type=text indexed=true stored=true
multiValued=false/
field name=start_date type=tdate indexed=true stored=true
multiValued=false/
field name=end_date type=tdate indexed=true stored=true
multiValued=false/
/fields

defaultSearchFieldtext/defaultSearchField

copyField source=consultant_firstname dest=text/
copyField source=consultant_lastname dest=text/
copyField source=company dest=text/

/schema

!--

So for instance, you have 2 consultants

Michael Davis and Tom Anderson who worked for AOL and Microsoft, Yahoo,
Google and Facebook.

Michael Davis = 1
Tom Anderson = 2

AOL = 1
Microsoft = 2
Yahoo = 3
Google = 4
Facebook = 5

This is how you would add the documents to the index

--

doc
consultant_id_company_id1_1/consultant_id_company_id
consultant_firstnameMichael/consultant_firstname
consultant_lastnameDavis/consultant_lastname
companyAOL/company
start_date2006-02-13T15:26:37Z/start_date
end_date2008-02-13T15:26:37Z/end_date
/doc

doc
consultant_id_company_id1_4/consultant_id_company_id
consultant_firstnameMichael/consultant_firstname
consultant_lastnameDavis/consultant_lastname
companyGoogle/company
start_date2006-02-13T15:26:37Z/start_date
end_date2009-02-13T15:26:37Z/end_date
/doc

doc
consultant_id_company_id2_3/consultant_id_company_id
consultant_firstnameTom/consultant_firstname
consultant_lastnameAnderson/consultant_lastname
companyYahoo/company
start_date2001-01-13T15:26:37Z/start_date
end_date2009-02-13T15:26:37Z/end_date
/doc

doc
consultant_id_company_id2_4/consultant_id_company_id
consultant_firstnameTom/consultant_firstname
consultant_lastnameAnderson/consultant_lastname
companyGoogle/company
start_date1999-02-13T15:26:37Z/start_date
end_date2010-02-13T15:26:37Z/end_date
/doc


The you can search as

q=company:X AND start_date:[X TO *] AND end_date:[* TO Z]

On Fri, Jun 4, 2010 at 4:58 PM, Moazzam Khan moazz...@gmail.com wrote:

 Hi guys,


 I have a list of consultants and the users (people who work for the
 company) are supposed to be able to search for consultants based on
 the time frame they worked for, for a company. For example, I should
 be able to search for all consultants who worked for Bear Stearns in
 the month of july. What is the best of accomplishing this?

 I was thinking of formatting the document like this

 company
   name Bear Stearns/name
   startDate2000-01-01/startDate
   endDatepresent/endDate
 /company
 company
   name AIG/name
   startDate1999-01-01/startDate
   endDate2000-01-01/endDate
 /company

 Is this possible?

 Thanks,

 Moazzam




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Solr spellchecker field

2010-05-28 Thread Israel Ekpo
Dejan,

How are you making the calls from PHP to Solr?

I am curious to know why the documents could not be parsed

On Fri, May 28, 2010 at 5:00 AM, Dejan Noveski dr.m...@gmail.com wrote:

 Thank you very much!

 On Fri, May 28, 2010 at 10:57 AM, Erik Hatcher erik.hatc...@gmail.com
 wrote:

  A field used to build a spellcheck index only needs to be indexed, not
  stored.
 
  But, your PHP issue could be alleviated anyway by simply customizing the
 fl
  parameter and excluding the large stored field.  This is often desirable
 for
  large fields that are never needed fully in the UI, but used internally
 for
  highlighting.
 
 Erik
 
 
  On May 28, 2010, at 4:47 AM, Dejan Noveski wrote:
 
   Hi,
 
  Does the field that is used for spellchecker indexing need to be stored
  and/or indexed? These fields became fairly large in my index, and php
 wont
  parse/decode the documents returned.
 
  --
  --
  Dejan Noveski
  Web Developer
  dr.m...@gmail.com
  Twitter: http://twitter.com/dekomote | LinkedIn:
  http://mk.linkedin.com/in/dejannoveski
 
 
 


 --
 --
 Dejan Noveski
 Web Developer
 dr.m...@gmail.com
 Twitter: http://twitter.com/dekomote | LinkedIn:
 http://mk.linkedin.com/in/dejannoveski




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Bitwise Operations on Integer Fields in Lucene and Solr Index

2010-05-13 Thread Israel Ekpo
Hello Lucene and Solr Community

I have a custom org.apache.lucene.search.Filter that I would like to
contribute to the Lucene and Solr projects.

So I would need some direction as to how to create and ISSUE or submit a
patch.

It looks like there have been changes to the way this is done since the
latest merge of the two projects (Lucene and Solr).

Recently, some Solr users have been looking for a way to perform bitwise
operations between and integer value and some fields in the Index

So, I wrote a Solr QParser plugin to do this using a custom Lucene Filter.

This package makes it possible to filter results returned from a query based
on the results of a bitwise operation on an integer field in the documents
returned from the pre-constructed query.

You can perform three basic types of operations on these integer fields

* BitwiseOperation.BITWISE_AND (bitwise AND)
* BitwiseOperation.BITWISE_OR (bitwise inclusive OR)
* BitwiseOperation.BITWISE_XOR (bitwise exclusive OR)

You can also negate the results of these operations.

For example, imagine there is an integer field in the index named flags
with the a value 8 (1000 in binary). The following results will be expected
:

   1. A source value of 8 will match during a BitwiseOperation.BITWISE_AND
operation, with negate set to false.
   2. A source value of 4 will match during a BitwiseOperation.BITWISE_AND
operation, with negate set to true.

The BitwiseFilter constructor accepts the following values

* The name of the integer field (A string)
* The BitwiseOperation object. Example BitwiseOperation.BITWISE_XOR
* The source value (an integer)
* A boolean value indicating whether or not to negate the results of the
operation
* A pre-constructed org.apache.lucene.search.Query

Here is an example of how you would use it with Solr

http://localhost:8983/solr/bitwise/select/?q={!bitwisefield=user_permissions
op=AND source=3 negate=true}state:FL

http://localhost:8983/solr/bitwise/select/?q={!bitwisefield=user_permissions
op=AND source=3}state:FL

Here is an example of how you would use it with Lucene

public class BitwiseTestSearch extends BitwiseTestBase {

public BitwiseTestSearch()
{

}

public void search() throws IOException, ParseException
{
setupSearch();

// term
Term t = new Term(COUNTRY_KEY, us);

// term query
Query q = new TermQuery(t);

// maximum number of documents to display
int limit = 1000;

int sourceValue = 0 ;

boolean negate = false;

BitwiseFilter bitwiseFilter = new BitwiseFilter(USER_PERMS_KEY,
BitwiseOperation.BITWISE_XOR, sourceValue, negate, q);

Query fq = new FilteredQuery(q, bitwiseFilter);

ScoreDoc[] hits = isearcher.search(fq, null, limit).scoreDocs;

BitwiseResultFilter resultFilter = bitwiseFilter.getResultFilter();

for (int i = 0; i  hits.length; i++) {

Document hitDoc = isearcher.doc(hits[i].doc);

System.out.println(FIRST_NAME_KEY +  field has a value of  +
hitDoc.get(FIRST_NAME_KEY));
System.out.println(LAST_NAME_KEY +  field has a value of  +
hitDoc.get(LAST_NAME_KEY));
System.out.println(ACTIVE_KEY +  field has a value of  +
hitDoc.get(ACTIVE_KEY));

System.out.println(USER_PERMS_KEY +  field has a value of  +
hitDoc.get(USER_PERMS_KEY));

System.out.println(doc ID --  + hits[i].doc);


System.out.println(...);
}

System.out.println(sourceValue =  + sourceValue + ,operation = 
+ resultFilter.getOperation().getOperationName() + , negate =  + negate);

System.out.println(A total of  + hits.length +  documents were
found from the search\n);

shutdown();
}

public static void main(String args[]) throws IOException,
ParseException
{
BitwiseTestSearch search = new BitwiseTestSearch();

search.search();
}
}

Any guidance would be highly appreciated.

Thanks.


-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Bitwise Operations on Integer Fields in Lucene and Solr Index

2010-05-13 Thread Israel Ekpo
I have created two ISSUES as new features

https://issues.apache.org/jira/browse/LUCENE-1560

https://issues.apache.org/jira/browse/SOLR-1913

The first one is for the Lucene Filter.

The second one is for the Solr QParserPlugin

The source code and jar files are attached and the Solr plugin is available
for use immediately.



On Thu, May 13, 2010 at 6:42 PM, Andrzej Bialecki a...@getopt.org wrote:

 On 2010-05-13 23:27, Israel Ekpo wrote:
  Hello Lucene and Solr Community
 
  I have a custom org.apache.lucene.search.Filter that I would like to
  contribute to the Lucene and Solr projects.
 
  So I would need some direction as to how to create and ISSUE or submit a
  patch.
 
  It looks like there have been changes to the way this is done since the
  latest merge of the two projects (Lucene and Solr).
 
  Recently, some Solr users have been looking for a way to perform bitwise
  operations between and integer value and some fields in the Index
 
  So, I wrote a Solr QParser plugin to do this using a custom Lucene
 Filter.
 
  This package makes it possible to filter results returned from a query
 based
  on the results of a bitwise operation on an integer field in the
 documents
  returned from the pre-constructed query.

 Hi,

 What a coincidence! :) I'm working on something very similar, only the
 use case that I need to support is slightly different - I want to
 support a ranked search based on a bitwise overlap of query value and
 field value. That is, the number of differing bits would reduce the
 score. This scenario occurs e.g. during near-duplicate detection that
 uses fuzzy signatures, on document- or sentence levels.

 I'm going to submit my code early next week, it still needs some
 polishing. I have two ways to execute this query, neither of which uses
 filters at the moment:

 * method 1: during indexing the bits in the fields are turned into
 on/off terms on the same field, and during search a BooleanQuery is
 formed from the int value with the same terms. Scoring is courtesy of
 BooleanScorer. This method supports only a single int value per field.

 * method 2, incomplete yet - during indexing the bits are turned into
 terms as before, but this method supports multiple int values per field:
 terms that correspond to bitmasks on the same value are put at the same
 positions. Then a specialized Query / Scorer traverses all 32 posting
 lists in parallel, moving through all matching docs and scoring
 according to how many terms matched at the same position.

 I wrapped this in a Solr FieldType, and instead of using a custom
 QParser plugin I simply implemented FieldType.getFieldQuery().

 It would be great to work out a convenient user-level API for this
 feature, both the scoring and the non-scoring case.

 --
 Best regards,
 Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Best way to handle bitfields in solr...

2010-05-13 Thread Israel Ekpo
William,

This QParserPlugin should solve that problem now.

Check out https://issues.apache.org/jira/browse/SOLR-1913

BitwiseQueryParserPlugin is a org.apache.solr.search.QParserPlugin that
allows users to filter the documents returned from a query by performing
bitwise operations between a particular integer field in the index and the
specified value.

The plugin is available immediately for your use.

On Fri, Dec 4, 2009 at 4:03 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Would http://wiki.apache.org/solr/FunctionQuery#fieldvalue help?

  Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



 - Original Message 
  From: William Pierce evalsi...@hotmail.com
  To: solr-user@lucene.apache.org
  Sent: Fri, December 4, 2009 2:43:25 PM
  Subject: Best way to handle bitfields in solr...
 
  Folks:
 
  In my db I currently have fields that represent bitmasks.   Thus, for
 example, a
  value of the mask of 48 might represent an undergraduate (value = 16)
 and
  graduate (value = 32).   Currently,  the corresponding field in solr is
 a
  multi-valued string field called EdLevel which will have
  Undergraduate and Graduate  as its two values (for
  this example).   I do the conversion from the int into the list of values
 as I
  do the indexing.
 
  Ideally, I'd like solr to have bitwise operations so that I could store
 the int
  value, and then simply search by using bit operations.  However, given
 that this
  is not possible,  and that there have been recent threads speaking to
  performance issues with multi-valued fields,  is there something better I
 could
  do?
 
  TIA,
 
  - Bill




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Bitwise Operations on Integer Fields in Lucene and Solr Index

2010-05-13 Thread Israel Ekpo
Correction,

I meant to list

https://issues.apache.org/jira/browse/LUCENE-2460
https://issues.apache.org/jira/browse/SOLR-1913



On Thu, May 13, 2010 at 10:13 PM, Israel Ekpo israele...@gmail.com wrote:

 I have created two ISSUES as new features

 https://issues.apache.org/jira/browse/LUCENE-1560

 https://issues.apache.org/jira/browse/SOLR-1913

 The first one is for the Lucene Filter.

 The second one is for the Solr QParserPlugin

 The source code and jar files are attached and the Solr plugin is available
 for use immediately.




 On Thu, May 13, 2010 at 6:42 PM, Andrzej Bialecki a...@getopt.org wrote:

 On 2010-05-13 23:27, Israel Ekpo wrote:
  Hello Lucene and Solr Community
 
  I have a custom org.apache.lucene.search.Filter that I would like to
  contribute to the Lucene and Solr projects.
 
  So I would need some direction as to how to create and ISSUE or submit a
  patch.
 
  It looks like there have been changes to the way this is done since the
  latest merge of the two projects (Lucene and Solr).
 
  Recently, some Solr users have been looking for a way to perform bitwise
  operations between and integer value and some fields in the Index
 
  So, I wrote a Solr QParser plugin to do this using a custom Lucene
 Filter.
 
  This package makes it possible to filter results returned from a query
 based
  on the results of a bitwise operation on an integer field in the
 documents
  returned from the pre-constructed query.

 Hi,

 What a coincidence! :) I'm working on something very similar, only the
 use case that I need to support is slightly different - I want to
 support a ranked search based on a bitwise overlap of query value and
 field value. That is, the number of differing bits would reduce the
 score. This scenario occurs e.g. during near-duplicate detection that
 uses fuzzy signatures, on document- or sentence levels.

 I'm going to submit my code early next week, it still needs some
 polishing. I have two ways to execute this query, neither of which uses
 filters at the moment:

 * method 1: during indexing the bits in the fields are turned into
 on/off terms on the same field, and during search a BooleanQuery is
 formed from the int value with the same terms. Scoring is courtesy of
 BooleanScorer. This method supports only a single int value per field.

 * method 2, incomplete yet - during indexing the bits are turned into
 terms as before, but this method supports multiple int values per field:
 terms that correspond to bitmasks on the same value are put at the same
 positions. Then a specialized Query / Scorer traverses all 32 posting
 lists in parallel, moving through all matching docs and scoring
 according to how many terms matched at the same position.

 I wrapped this in a Solr FieldType, and instead of using a custom
 QParser plugin I simply implemented FieldType.getFieldQuery().

 It would be great to work out a convenient user-level API for this
 feature, both the scoring and the non-scoring case.

 --
 Best regards,
 Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.
 http://www.israelekpo.com/




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


[PECL-DEV] [ANNOUNCEMENT] solr-0.9.10 (beta) Released

2010-05-04 Thread Israel Ekpo
The new PECL package solr-0.9.10 (beta) has been released at
http://pecl.php.net/.

Release notes
-
- Increased compatibility with older systems running CentOS 4 or 5 and RHEL4
or 5
- Added ability to compile directly without having to build libcurl and
libxml2 from source on older systems
- Lowered minimum supported version for libcurl to 7.15.0 (Alex Samorukov)
- Lowered minimum supported version for libxml2 to 2.6.26 (Alex Samorukov)
- Fixed PECL Bug# 17172 MoreLikeThis only parses one doc (trevor at blubolt
dot com, max at blubolt dot com)
- Declared workaround macros for SSL private key constants due to support
for earlier versions of libcurl (Alex Samorukov)
- Changed extension version numbers to start using hexadecimal numbers
(Israel Ekpo)
- Added instructions on how to attempt to compile on windows (Israel Ekpo)
- Fixed PECL Bug# 17292 sending UTF-8 encoding in header (giguet at info dot
unicaen dot fr)

Package Info
-
It effectively simplifies the process of interacting with Apache Solr using
PHP5 and it already comes with built-in readiness for the latest features
added in Solr 1.4. The extension has features such as built-in, serializable
query string builder objects which effectively simplifies the manipulation
of name-value pair request parameters across repeated requests. The response
from the Solr server is also automatically parsed into native php objects
whose properties can be accessed as array keys or object properties without
any additional configuration on the client-side. Its advanced HTTP client
reuses the same connection across multiple requests and provides built-in
support for connecting to Solr servers secured behind HTTP Authentication or
HTTP proxy servers. It is also able to connect to SSL-enabled containers.
Please consult the documentation for more details on features.

Related Links
-
Package home: http://pecl.php.net/package/solr
Changelog: http://pecl.php.net/package-changelog.php?package=solr
Download: http://pecl.php.net/get/solr-0.9.10.tgz
Documentation: http://www.php.net/solr

Authors
-
Israel Ekpo ie...@php.net (lead)


Re: Evangelism

2010-04-29 Thread Israel Ekpo
Checkout Lucid Imagination

http://www.lucidimagination.com/About-Search

This should convince you.

On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman da...@hostworks.comwrote:

 Hi I'm new to the list here,



 I'd like to steer someone in the direction of Solr, and I see the list of
 companies using solr, but none have a power by solr logo or anything.



 Does anyone have any great links with evidence to majorly successful solr
 projects?



 Thanks in advance,



 Dan B.






-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Evangelism

2010-04-29 Thread Israel Ekpo
Their main search page has the Powered by Solr logo

http://www.lucidimagination.com/search/



On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo israele...@gmail.com wrote:

 Checkout Lucid Imagination

 http://www.lucidimagination.com/About-Search

 This should convince you.


 On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman da...@hostworks.comwrote:

 Hi I'm new to the list here,



 I'd like to steer someone in the direction of Solr, and I see the list of
 companies using solr, but none have a power by solr logo or anything.



 Does anyone have any great links with evidence to majorly successful solr
 projects?



 Thanks in advance,



 Dan B.






 --
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.
 http://www.israelekpo.com/




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Evangelism

2010-04-29 Thread Israel Ekpo
A lot of high performing websites use MySQL, Oracle and Microsoft SQL Server
for data storage and other RDBMS needs without necessarily putting the
powered by logo on the sites.

If you need the certified version of Apache Solr, you can contact Lucid
Imagination.

Just like MySQL, Apache Solr and Apache Lucene also have commercial backing
(from Lucid Imagination) if you choose to go that route.

On Thu, Apr 29, 2010 at 2:24 PM, Nagelberg, Kallin 
knagelb...@globeandmail.com wrote:

 I had a very hard time selling Solr to business folks. Most are of the mind
 that if you're not paying for something it can't be any good. That might
 also be why they refrain from posting 'powered by solr' on their website, as
 if it might show them to be cheap. They are also fearful of lack of support
 should you get hit by a bus. This might be remedied by recommending
 professional services from a company such as lucid imagination.

 I think your best bet is to create a working demo with your data and show
 them the performance.

 Cheers,
 -Kallin Nagelberg



 -Original Message-
 From: Israel Ekpo [mailto:israele...@gmail.com]
 Sent: Thursday, April 29, 2010 2:19 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Evangelism

 Their main search page has the Powered by Solr logo

 http://www.lucidimagination.com/search/



 On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo israele...@gmail.com wrote:

  Checkout Lucid Imagination
 
  http://www.lucidimagination.com/About-Search
 
  This should convince you.
 
 
  On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman da...@hostworks.com
 wrote:
 
  Hi I'm new to the list here,
 
 
 
  I'd like to steer someone in the direction of Solr, and I see the list
 of
  companies using solr, but none have a power by solr logo or anything.
 
 
 
  Does anyone have any great links with evidence to majorly successful
 solr
  projects?
 
 
 
  Thanks in advance,
 
 
 
  Dan B.
 
 
 
 
 
 
  --
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the gift.
  Quality First. Measure Twice. Cut Once.
  http://www.israelekpo.com/
 



 --
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.
 http://www.israelekpo.com/




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Tutorials for developing filter plugins.

2010-04-11 Thread Israel Ekpo
He is referring to the org.apache.lucene.search.Filter classes.

Michael,

I did a search too and I could not really find any useful tutorials on the
subject.

You can take a look at how this is implemented in the Spatial Solr Plugin by
the JTeam

http://www.jteam.nl/news/spatialsolr.html

Their code, I believe, uses the bits() method which has been deprecated in
Lucene 2.9 and removed in 3.0.

The getDocIdSet() method returns a BitSet object which you can prepare from
org.apache.lucene.util.OpenBitSet

I think there is probably some example in the new version (2nd Edition) of
the *Lucene In Action *book on how to do something similar.

You should check it out from the Manning Early Access Program page.

http://www.manning.com/hatcher3/

You should also check out the Solr 1.5 source code for how some of the
Lucene Filter classes are designed.



On Sat, Apr 10, 2010 at 5:23 AM, MitchK mitc...@web.de wrote:


 Hi Michael,

 do you mean a TokenFilter like StopWordFilter?

 If you like, you could post some code, so one can help you.
 It's really easy to develop some TokenFilters, if you have a look at
 already
 implemented ones.

 Kind regards
 - Mitch
 --
 View this message in context:
 http://n3.nabble.com/Tutorials-for-developing-filter-plugins-tp706874p709897.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: add/update document as distinct operations? Is it possible?

2010-04-05 Thread Israel Ekpo
Chris,

I don't see anything in the headers suggesting that Julian's message was a
hijack of another thread

On Thu, Apr 1, 2010 at 2:17 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : Subject: add/update document as distinct operations? Is it possible?
 : References:
 :
 dc9f7963609bed43b1ab02f3ce52863103dc35f...@bene-exch-01.benetech.local
 : In-Reply-To:
 :
 dc9f7963609bed43b1ab02f3ce52863103dc35f...@bene-exch-01.benetech.local

 http://people.apache.org/~hossman/#threadhijackhttp://people.apache.org/%7Ehossman/#threadhijack
 Thread Hijacking on Mailing Lists

 When starting a new discussion on a mailing list, please do not reply to
 an existing message, instead start a fresh email.  Even if you change the
 subject line of your email, other mail headers still track which thread
 you replied to and your question is hidden in that thread and gets less
 attention.   It makes following discussions in the mailing list archives
 particularly difficult.
 See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking




 -Hoss




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: selecting documents older than 4 hours

2010-04-01 Thread Israel Ekpo
I did something similar.

The only difference with my set up is that I have two columns; one that
store the dates the document was first created and a second that stores the
date it was last updated as unix time stamps

So my query to find documents that are older than 4 hours would be very easy

To find documents that were last updated more than for hours ago you would
do something like this

q=last_update_date:[* TO 1270119278]

The current timestamp now is 1270133678. 4 hours ago was 1270119278

The column types in the schema is tint



On Wed, Mar 31, 2010 at 11:18 PM, herceg_novi herceg_n...@yahoo.com wrote:


 Hello, I'd like to select documents older than 4 hours in my Solr 1.4
 installation.

 The query

 q=last_update_date:[NOW-7DAYS TO NOW-4HOURS]

 does not return a correct recordset. I would expect to get all documents
 with last_update_date in the specified range. Instead solr returns all
 documents that exist in the index which is not what I would expect.
 Last_update_date is SolrDate field.

 This does not work either
 q=last_update_date:[NOW/DAY-7DAYS TO NOW/HOUR-4HOURS]

 This works, but I manually had to calculate the 4 hour difference and
 insert
 solr date formated timestamp into my query (I prefer not to do that)
 q=last_update_date:[NOW/DAY-7DAYS TO 2010-03-31T19:40:34Z]

 Any ideas if I can get this to work as expected?
 q=last_update_date:[NOW-7DAYS TO NOW-4HOURS]

 Thanks!
 --
 View this message in context:
 http://n3.nabble.com/selecting-documents-older-than-4-hours-tp689975p689975.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Features not present in Solr

2010-03-22 Thread Israel Ekpo
On Mon, Mar 22, 2010 at 3:16 PM, Lance Norskog goks...@gmail.com wrote:

 Web crawling.


I don't think Solr was designed with Web Crawling in mind. Nutch would be
more better suited for that, I believe.


 Text analysis.


This is a bit vague.

Please elaborate further. There is a lot of analysis (stemming, stop-word
removal, character transformation etc) that takes place already though
implicitly based on what fields you define and use in the schema.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters


 Distributed index management.
 A fanatical devotion to the Pope.

 There a probably a lot of features already available in Solr out of the box
that most of those other enterprise level applications do not have yet.

You would also be surprised to learn that a lot of them use Lucene under the
covers and are actually trying to re-implement what is already available in
Solr.


 On Sun, Mar 21, 2010 at 11:19 PM, MitchK mitc...@web.de wrote:
 
  Srikanth,
 
  I don't know anything about Endeca, so I can't compare Solr to it.
  However, I know Solr is powerful. Very powerful.
  So, maybe you should tell us more about your needs to get a good answer.
 
  As a response to your second question: You should not expect that Solr is
  a database. It is an index-server. A database makes your data save. If
 there
  goes something wrong - which is always possible - Solr gives no
 warranties.
  Maybe someone other can tell you more about this topic.
 
  - Mitch
 
 
  Srikanth B wrote:
 
  Hello
 
  We are in the process of researching on Solr features. I am looking for
  two
  things
  1. Features not available in Solr but present in other products
  like
  Endeca
  2. What one shouldn't not expect from Solr
 
  Any thoughts ?
 
  Thanks in advance
  Srikanth
 
 
 
  --
  View this message in context:
 http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 



 --
 Lance Norskog
 goks...@gmail.com




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Features not present in Solr

2010-03-20 Thread Israel Ekpo
One feature that is not available in Solr is any licensing fees and fine
print.

Also you should not expect to pay in order to use Solr.

On Fri, Mar 19, 2010 at 11:16 PM, Srikanth B srikanth...@gmail.com wrote:

 Hello

 We are in the process of researching on Solr features. I am looking for two
 things
1. Features not available in Solr but present in other products like
 Endeca
2. What one shouldn't not expect from Solr

 Any thoughts ?

 Thanks in advance
 Srikanth




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Search on dynamic fields which contains spaces /special characters

2010-03-08 Thread Israel Ekpo
I do not believe the SOLR or LUCENE syntax allows this

You need to get rid of all the spaces in the field name

If not, then you will be searching for short in the default field and then
name1 in the name field.

http://wiki.apache.org/solr/SolrQuerySyntax

http://lucene.apache.org/java/2_9_2/queryparsersyntax.html


On Mon, Mar 8, 2010 at 2:17 PM, JavaGuy84 bbar...@gmail.com wrote:


 Hi,

 We have some dynamic fields getting indexed using SOLR. Some of the dynamic
 fields contains spaces / special character (something like: short name,
 Full
 Name etc...). Is there a way to search on these fields (which contains the
 spaces etc..). Can someone let me know the filter I need to pass to do this
 type of search?

 I tried with short name:name1 -- this didnt work..

 Thanks,
 Barani
 --
 View this message in context:
 http://old.nabble.com/Search-on-dynamic-fields-which-contains-spaces--special-characters-tp27826147p27826147.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: get Server Status, TotalDocCount .... PHP !

2010-03-02 Thread Israel Ekpo
The last time I tried using SolrPHPClient for this stuff, it did not really
handle the response very well because of the JSON response generated on the
server side.

I am not sure if anything has changed since then.

The JSON code generated could not be parsed properly.

If you do not want to analyze the xml response each time and if you are not
using the pecl extension you will need to send a request manually to the
solr server using CURL and you have to specify the response format as phps


On Tue, Mar 2, 2010 at 9:59 AM, stocki st...@shopgate.com wrote:


 Hey-

 No i use the SolrPHPClient http://code.google.com/p/solr-php-client/
 i not really want tu use two different php-libs. ^^

 what do you mean with unserialize ? XD





 Guillaume Rossolini-2 wrote:
 
  Hi
 
  Have you tried the php_solr extension from PECL?  It has a handy
  SolrPingResponse class.
  Or you could just call the CORENAME/admin/ping?wt=phps URL and
 unserialize
  it.
 
  Regards,
 
  --
  I N S T A N T  |  L U X E - 44 rue de Montmorency | 75003 Paris | France
  Tél. : 01 80 50 52 51 | Mob. : 06 09 96 10 29 | web :
 www.instantluxe.com
 
 
  On Tue, Mar 2, 2010 at 2:50 PM, stocki st...@shopgate.com wrote:
 
 
  hello
 
  I use Solr in my cakePHP Framework.
 
  How can i get status information of my solr cores ??
 
  I dont want analyze everytime the responseXML.
 
  do anybody know a nice way to get status messages from solr ?
 
  thx ;) Jonas
  --
  View this message in context:
 
 http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756118.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756852.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: updating particular field

2010-03-01 Thread Israel Ekpo
Unfortunately, because of how Lucene works internally, you will not be able
to update just one or two fields. You have to resubmit the entire document.

If you only send just one or two fields, then the updated document will only
have the fields sent in the last update.

On Mon, Mar 1, 2010 at 7:09 AM, Suram reactive...@yahoo.com wrote:




 Siddhant wrote:
 
  Yes. You can just re-add the document with your changes, and the rest of
  the
  fields in the document will remain unchanged.
 
  On Mon, Mar 1, 2010 at 5:09 PM, Suram reactive...@yahoo.com wrote:
 
 
  Hi,
 
  doc
field name=idEN7800GTX/2DHTV/256M/field
   field name=manuASUS Computer Inc./field
   field name=catelectronics/field
   field name=catgraphics card/field
   field name=featuresNVIDIA GeForce 7800 GTX GPU/VPU clocked at
  486MHz/field
   field name=features256MB GDDR3 Memory clocked at 1.35GHz/field
   field name=price479.95/field
   field name=popularity7/field
   field name=inStockfalse/field
   field name=manufacturedate_dt2006-02-13T15:26:37Z/DAY/field
  /doc
 
  can i possible to update field name=inStocktrue/field without
  affect
  any field of my previous document
 
  Thanks in advance
  --
  View this message in context:
 
 http://old.nabble.com/updating-particular-field-tp27742399p27742399.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  - Siddhant
 
 


 Hi,
   Here i don't want to reload entire data just i want u update a field
 i need to change(ie one or more field with id not whole)


 --
 View this message in context:
 http://old.nabble.com/updating-particular-field-tp27742399p27742671.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: If you could have one feature in Solr...

2010-02-24 Thread Israel Ekpo
Grant,

One feature that I would like to see is the ability to do a Bitwise search

I have had to work around this with a Query Parser plugin that uses a
org.apache.lucene.search.Filter

I think having this feature would be very nice and I prefer it to searching
with multiple OR type queries especially when the bit are known ahead of
time.

I can submit the code as a patch once I get the approval to do so.

On Wed, Feb 24, 2010 at 2:20 PM, straup str...@gmail.com wrote:

 I actually found the documentation pretty great especially since (my
 experience, anyway) most Java projects seem to default to generic JavaDoc
 derived documentation (and that makes me cry).

 That said, more cookbook-style recipes or stories would be helpful for
 some of the more esoteric parts of Solr.

 Also: real-time indexing and geo.

 Cheers,


 On 2/24/10 9:54 AM, Grant Ingersoll wrote:


 On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote:

  Decent documentation.


 What parts do you feel are lacking?  Or is it just across the board?
  Wikis are both good and bad for documentation, IMO.

 -Grant





-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Question about custom Lucene filters and Solr

2010-02-16 Thread Israel Ekpo
Hi Jon,

You will need to write a plugin

You will need custom Query parser and an Update Handler depending on what
you are doing.

The implementation of an Update Handler or Update Request Processor is not
recommended because it is considered to be advanced.

Take a look at the following links for more information

http://wiki.apache.org/solr/SolrPlugins

http://wiki.apache.org/solr/UpdateRequestProcessor

http://lucene.apache.org/solr/api/org/apache/solr/update/UpdateHandler.html

http://lucene.apache.org/solr/api/org/apache/solr/search/QParserPlugin.html

http://lucene.apache.org/solr/api/org/apache/solr/update/processor/UpdateRequestProcessor.html

On Tue, Feb 16, 2010 at 2:43 PM, Jon Bodner jon.bod...@it.com wrote:

 Hello,

 I'm interested in using Solr with a custom Lucene Filter (like the one
 described in section 6.4.1 of the Lucene In Action, Second Edition book).
  I'd like to filter search results from a Lucene index against information
 stored in a relational database.  I don't want to move the relational
 database information into the search index, because it could change
 frequently.

 I looked at writing my own custom Solr SearchComponent, but the
 documentation for those seems slim.  Is this the correct approach?  Is there
 another way?

 Thanks,

 Jon




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Basic questions about Solr cost in programming time

2010-01-26 Thread Israel Ekpo
On Tue, Jan 26, 2010 at 3:00 PM, Jeff Crump jcr...@hq.mercycorps.orgwrote:

 Hi,
 I hope this message is OK for this list.

 I'm looking into search solutions for an intranet site built with Drupal.
 Eventually we'd like to scale to enterprise search, which would include the
 Drupal site, a document repository, and Jive SBS (collaboration software).
 I'm interested in Lucene/Solr because of its scalability, faceted search
 and
 optimization features, and because it is free. Our problem is that we are a
 non-profit organization with only three very busy programmers/sys admins
 supporting our employees around the world.

 To help me argue for Solr in terms of total cost, I'm hoping that members
 of
 this list can share their insights about the following:

 * About how many hours of programming did it take you to set up your
 instance of Lucene/Solr (not counting time spent on optimization)?


For me this generally took 30 to 70 hours to create the entire search
application depending on the features on the web application and the
complexity of the site.


 * Are there any disadvantages of going with a certified distribution rather
 than the standard distribution?


 The people at Lucid Imagination can probably provide a better answer for
this. It is not really a disadvantage to go with the certified version but
you may have to pay in order to get the certified distribution. However, you
will get dedicated support if you happen to run into any issues or need
technical assistance.

If you use the standard version you can always get help from the mailing
list if you have any issues.



 Thanks and best regards,
 Jeff

 Jeff Crump
 jcr...@hq.mercycorps.org













-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: What is this error means?

2010-01-12 Thread Israel Ekpo
Ellery,

A preliminary look at the source code indicates that the error is happening
because the solr server is taking longer than expected to respond to the
client

http://code.google.com/p/solr-php-client/source/browse/trunk/Apache/Solr/Service.php

The default time out handed down to Apache_Solr_Service:_sendRawPost() is 60
seconds since you were calling the addDocument() method

So if it took longer than that (1 minute), then it will exit with that error
message.

You will have to increase the default value to something very high like 10
minutes or so on line 252 in the source code since there is no way to
specify that in the constructor or the addDocument method.

Another alternative will be to update the default_socket_timeout in the
php.ini file or in the code using ini_set

I hope that helps



On Tue, Jan 12, 2010 at 9:33 PM, Ellery Leung elleryle...@be-o.com wrote:


 Hi, here is the stack trace:

 br /
 Fatal error:  Uncaught exception 'Exception' with message 'quot;0quot;
 Status: Communication Error' in
 C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Serv
 ice.php:385
 Stack trace:
 #0 C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php(652):
 Apache_Solr_Ser
 vice-gt;_sendRawPost('http://127.0.0', 'lt;add allowDups=...')
 #1 C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php(676):
 Apache_Solr_Ser
 vice-gt;add('lt;add allowDups=...')
 #2

 C:\nginx\html\apps\milio\lib\System\classes\SolrSearchEngine.class.php(221):
 Apache_Solr_Service-gt;addDocument(Object(Apache_Solr_Document))
 #3

 C:\nginx\html\apps\milio\lib\System\classes\SolrSearchEngine.class.php(262):
 SolrSearchEngine-gt;buildIndex(Array, 'key')
 #4
 C:\nginx\html\apps\milio\lib\System\classes\Indexer\Indexer.class.php(51):
 So
 lrSearchEngine-gt;createFullIndex('contacts', Array, 'key', 'www')
 #5 C:\nginx\html\apps\milio\lib\System\functions\createIndex.php(64):
 Indexer-g
 t;create('www')
 #6 {main}
  thrown in C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php on li
 ne 385br /

 C:\nginx\html\apps\milio\htdocs\Contactspause
 Press any key to continue . . .

 Thanks for helping me.


 Grant Ingersoll-6 wrote:
 
  Do you have a stack trace?
 
  On Jan 12, 2010, at 2:54 AM, Ellery Leung wrote:
 
  When I am building the index for around 2 ~ 25000 records, sometimes
  I
  came across with this error:
 
 
 
  Uncaught exception Exception with message '0' Status: Communication
  Error
 
 
 
  I search Google  Yahoo but no answer.
 
 
 
  I am now committing document to solr on every 10 records fetched from a
  SQLite Database with PHP 5.3.
 
 
 
  Platform: Windows 7 Home
 
  Web server: Nginx
 
  Solr Specification Version: 1.4.0
 
  Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06
  12:33:40
 
  Lucene Specification Version: 2.9.1
 
  Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25
 
  Solr hosted in jetty 6.1.3
 
 
 
  All the above are in one single test machine.
 
 
 
  The situation is that sometimes when I build the index, it can be
 created
  successfully.  But sometimes it will just stop with the above error.
 
 
 
  Any clue?  Please help.
 
 
 
  Thank you in advance.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/What-is-this-error-means--tp27123815p27138658.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


[PECL-DEV] [ANNOUNCEMENT] solr-0.9.9 (beta) Released

2010-01-10 Thread Israel Ekpo
The new PECL package solr-0.9.9 (beta) has been released at
http://pecl.php.net/.

Release notes
-
- Fixed Bug #17009 Creating two SolrQuery objects leads to wrong query value
- Reset the buffer for the request data from the previous request in
SolrClient
- Added new internal static function solr_set_initial_curl_handle_options()
- Moved the intialization of CURL handle options to
solr_set_initial_curl_handle_options() function
- Resetting the CURL options on the (CURL *) handle after each request is
completed
- Added more explicit error message to indicate that cloning SolrParams
objects and its descendants is currently not yet supported

Package Info
-
It effectively simplifies the process of interacting with Apache Solr using
PHP5 and it already comes with built-in readiness for the latest features
available in Solr 1.4. The extension has features such as built-in,
serializable query string builder objects which effectively simplifies the
manipulation of name-value pair request parameters across repeated requests.
The response from the Solr server is also automatically parsed into native
php objects whose properties can be accessed as array keys or object
properties without any additional configuration on the client-side. Its
advanced HTTP client reuses the same connection across multiple requests and
provides built-in support for connecting to Solr servers secured behind HTTP
Authentication or HTTP proxy servers. It is also able to connect to
SSL-enabled containers. Please consult the documentation for more details on
features.

Related Links
-
Package home: http://pecl.php.net/package/solr
Changelog: http://pecl.php.net/package-changelog.php?package=solr
Download: http://pecl.php.net/get/solr-0.9.9.tgz
Documentation: http://us.php.net/solr

Authors
-
Israel Ekpo ie...@php.net (lead)


Re: Help with creating a solr schema

2010-01-01 Thread Israel Ekpo
On Thu, Dec 31, 2009 at 10:26 AM, JaredM emru...@gmail.com wrote:


 Hi,

 I'm new to Solr but so far I think its great.  I've spent 2 weeks reading
 through the wiki and mailing list info.

 I have a use case and I'm not sure what the best way is to implement it.  I
 am keeping track of peoples calendar schedules in a really simple way: each
 user can login and input a number of date ranges where they are available
 (so for example - User Alice might be available between 1-Jan-2010 -
 15-Jan-2010 and 20-Feb-2010 - 22-Feb-2010 and 1-Mar-2010-5-Mar-2010.

 In my data model I have this modelled as a one-to-many with a User table
 (consisting of username, some metadata) and an Availability table
 (consisting of start date and end date).

 Now I need to search which users are available between a given date range.
 The bit I'm having trouble with is how to store multiple start  end date
 pairs.  Can someone provide some guidance?
 --
 View this message in context:
 http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26979319.html
 Sent from the Solr - User mailing list archive at Nabble.com.



I have done something similar to this before.

You will have to store the username, firstname, lastname as single valued
fields

field name=username type=string indexed=true stored=true
required=true/
field name=firstname type=string indexed=true stored=true /
field name=lastname type=string indexed=true stored=true /
field name=start_date type=tint indexed=true stored=true
multiValued=true/
field name=end_date type=tint indexed=true stored=true
multiValued=true/

However, the start and end dates should be multivalued tint types.

I decided to store the dates as UNIX timestamps. The start dates are stored
as the unix timestamps at 12 midnight of that date (00:00:00)

The end dates are stored as the unix time stamps at 11:59:59 PM on the end
date 23:59:59

This (storing the dates as Trie integers) gave me faster range query
results.

when searching you will also have to convert the dates to unix time stamps
using similar logic before using it in the solr search query

You should use the username of the user as the uniqueKey.

If a user has multiple dates of availability you will enter it like so:

add
doc
field name=usernameexampleun/field
field name=firstnameexamplefnf/field
field name=lastnameexampleln/field
field name=start_date137865661/field
field name=start_date137865662/field
field name=start_date137865663/field
field name=end_date137865681/field
field name=end_date137865682/field
field name=end_date137865683/field
/doc
/add


-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: solr 1.4 csv import -- Document missing required field: id

2010-01-01 Thread Israel Ekpo
On Fri, Jan 1, 2010 at 9:13 PM, evana evre...@ymail.com wrote:


 Hi,

 I am trying to import a csv file (without id field) on solr 1.4
 In schema.xml id field is set with required=false.
 But I am getting org.apache.solr.common.SolrException: Document missing
 required field: id

 Following is the schema.xml fields section
  fields
   field name=id type=string indexed=true stored=true
 required=false /
   field name=name type=textgen indexed=true stored=true/
   field name=text type=text indexed=true stored=true
 multiValued=true/

   dynamicField name=ignored_* type=ignored multiValued=true/
   dynamicField name=random_* type=random /
   dynamicField name=* type=string indexed=true/
  /fields

  uniqueKeyid/uniqueKey


 Following is the csv file
company_id,customer_name,active
58,Apache,Y
58,Solr,Y
58,Lucene,Y
60,IBM,Y

 Following is the solrj import client
SolrServer server = new
 CommonsHttpSolrServer(http://localhost:8080/solr;);
ContentStreamUpdateRequest req = new
 ContentStreamUpdateRequest(/update/csv);
req.addFile(new File(filename));
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList result = server.request(req);
System.out.println(Result:  + result);


 Could any of you help out please.

 Thanks
 --
 View this message in context:
 http://old.nabble.com/solr-1.4-csv-import-Document-missing-required-field%3A-id-tp26990048p26990048.html
 Sent from the Solr - User mailing list archive at Nabble.com.


The presence of the uniqueKey definition implicitly implies that the id
field is a required field in the document even tough the attribute is set to
false on the field definition.

Try removing the uniqueKey definition for the id field in the schema.xml
file and then try again to run the update script or application.

The uniqueKey definition is not needed if you are going to build the index
from scratch each time you do the import.

However, if you are doing incremental updates, this field is required and
the uniqueKey definition is also needed too to specify what the primary
key for the doucment is.

http://wiki.apache.org/solr/UniqueKey


-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Help with creating a solr schema

2010-01-01 Thread Israel Ekpo
On Fri, Jan 1, 2010 at 9:47 PM, JaredM emru...@gmail.com wrote:


 Thanks Ahmet and Israel.  I prefer Israel's approach since the amount of
 metadata for the user is quite high but I'm not clear how to get around one
 problem:

 If I had 2 availabilities (I've left it in human-readable form instead of
 as
 a UNIX timestamp only for ease of understanding):

 field name=start_date10-Jan-2010/field
 field name=start_date20-Jan-2010/field
 field name=end_date25-Jan-2010/field
 field name=end_date28-Jan-2010/field

 and I wanted to query for availability between 12-Jan-2010 to 26-Jan-2010
 then then wouldn't the above document be returned (even though the use
 would
 not be available 20-25 Jan?
 --
 View this message in context:
 http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26990178.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Unfortunately,

For this particular use case, if you are using the out-of-the-box features
available in Solr 1.4, without a custom Solr plugin using a custom Lucene
filter and some special value storage arrangement for the fields, you will
have to store each start and end date as a separate document. So, there will
be N separate documents for each username if that user has N distinct
periods of availabilty. The start date and end date fields would also have
to be single valued instead of multi-valued as I specified in the earlier
post.

Sorry.
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: How use implement Lucene for perl.

2009-12-28 Thread Israel Ekpo
I think you need to send a message to the lucene mailing list instead if you
want to use Lucene directly.

java-u...@lucene.apache.org

The API core Javadoc page has a very simple example which you can use to get
started with a few modifications.

http://lucene.apache.org/java/3_0_0/api/core/index.html

Use the documentation to select the appropriate constructor and method
signatures.

On the other hand, I think Solr can do everything that you need without the
need to interact directly with the lucene API

On Mon, Dec 28, 2009 at 11:42 PM, Maheshwar maheshwar2...@gmail.com wrote:


 I am new for Lucene.
 I haven't any idea about Lucene.
 I want to implement Lucene in my search script.
 Please guide me what I needs to be do for Lucene implementation.

 Actually, I want to integrate lucene search with message board system where
 people come to post new topic, edit that topic and delete that on needs. I
 want, to update search index at every action.
 So I need some valuable help.



 --
 View this message in context:
 http://old.nabble.com/How-use-implement-Lucene-for-perl.-tp26951130p26951130.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: NOT highlighting synonym

2009-12-28 Thread Israel Ekpo
I think what Erik was referring to was for you to create a separate copy
field with different analyzers and just copy the original value to that copy
field and index it differently.

That way you can use one field for search and another one to display the
highlighting results.



On Mon, Dec 28, 2009 at 1:00 PM, darniz rnizamud...@edmunds.com wrote:


 Thanks
 Unfortunately thats not the case.
 We are using the same field to do search on and display that text.
 So looks like in this case this is not possible
 Am i correct


 We have a custom field type with synonyms defined at query time.

 Erik Hatcher-4 wrote:
 
 
  On Dec 23, 2009, at 2:26 PM, darniz wrote:
  i have a requirement where we dont want to hightlight synonym matches.
  for example i search for caddy and i dont want to highlight matched
  synonym
  like cadillac.
  Looking at highlighting parameters i didn't find any support for this.
  anyone can offer any advice.
 
  You can control what gets highlighted by which analyzer is used.  You
  may need a different field for highlighting than you use for searching
  in this case - but you can just create another field type without the
  synonym filter in it and use that for highlighting.
 
Erik
 
 
 

 --
 View this message in context:
 http://old.nabble.com/NOT-highlighting-synonym-tp26906321p26945921.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: solr php client vs file_get_contents?

2009-12-15 Thread Israel Ekpo
On Tue, Dec 15, 2009 at 8:49 AM, Faire Mii faire@gmail.com wrote:

 i am using php to access solr and i wonder one thing.

 why should i use solr php client when i can use

 $serializedResult = file_get_contents('http://localhost:8983/solr/
 select?q=niklaswt=phps');

 to get the result in arrays and then print them out?

 i dont really get the difference. is there any richer features with the php
 client?


 regards

 fayer



Hi Faire,

Have you actually used this library before? I think the library is pretty
well thought out.

From a simple glance at the source code you can see that one can use it for
the following purposes:

1. Adding documents to the index (which you cannot just do with
file_get_contents alone). So that's one diff

2. Updating existing documents

3. Deleting existing documents.

4. Balancing requests across multiple backend servers

There are other operations with the Solr server that the library can also
perform.

Some example of what I am referring to is illustrated here

http://code.google.com/p/solr-php-client/wiki/FAQ

http://code.google.com/p/solr-php-client/wiki/ExampleUsage

IBM also has an interesting article illustrating how to add documents to the
Solr index and issue commit and optimize calls using this library.

http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/

The author of the library can probably give you more details on what the
library has to offer.

I think you should download the source code and spend some time looking at
all the features it has to offer.

In my opinion, it is not fair to compare a well thought out library like
that with a simple php function.
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Can solr web site have multiple versions of online API doc?

2009-12-15 Thread Israel Ekpo
2009/12/15 Teruhiko Kurosaka k...@basistech.com

 Lucene keeps multiple versions of its API doc online at
 http://lucene.apache.org/java/X_Y_Z/api/all/index.html
 for version X.Y.Z.  I am finding this very useful when
 comparing different versions.  This is also good because
 the javadoc comments that I write for my software can
 reference the API comments of the exact version of
 Lucene that I am using.

 At Solr site, I can only find the API doc of the trunk
 build.  I cannot find 1.3.0 API doc, for example.

 Can Solr site also maintain the API docs for the past
 stable versions ?

 -kuro


Hi Teruhiko

If you downloaded the 1.3.0 release, you should find a docs folder inside
the zip file.

This contains the javadoc for that particular release.

You may also re download a 1.3.0 release to get the docs for Solr 1.3.

I hope this helps.

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: apache-solr-common.jar

2009-12-14 Thread Israel Ekpo
2009/12/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 there is no solrcommon jar anymore. you may use the solrj jar which
 contains all the classes which were there in the comon jar.

 On Mon, Dec 14, 2009 at 9:22 PM, gudumba l gudumba.sm...@gmail.com
 wrote:
  Hello All,
I have been using apache-solr-common-1.3.0.jar in my
 module.
  I am planning to shift to the latest version, because of course it has
 more
  flexibility. But it is really strange that I dont find any corresponding
 jar
  of the latest version. I have serached in total apachae solr 1.4 folder
  (which is downloaded from site), but have not found any. , I am sorry,
 its
  really silly to request for a jar, but have no option.
  Thanks.
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com



I had the same question too earlier last week and I found out after some
research where the packages are bundled.

The specific jar is the dist folder as
apache-solr-1.4.0/dist/apache-solr-solrj-1.4.0.jar

This was where I found the classes in the org.apache.solr.common.* packages

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Sol server is not set up ??

2009-12-11 Thread Israel Ekpo
On Fri, Dec 11, 2009 at 7:54 AM, regany re...@newzealand.co.nz wrote:


 Hello!

 I'm trying to successfully build/install the PHP Solr Extension, but am
 running into an error when doing a make test - the following 4 tests
 fail,
 the other 17 pass. The Solr server is definately running because I can
 access it via the admin URL. Anyone know what else may be causing the make
 test to think teh solr server is not set up???

 regan

 =
 Running selected tests.
 TEST 1/21 [tests/solrclient_001.phpt]
 SKIP SolrClient::addDocument() - Sending a single document to the Solr
 server [tests/solrclient_001.phpt] reason: Solr server is not set up
 TEST 2/21 [tests/solrclient_002.phpt]
 SKIP SolrClient::addDocuments() - sending multiple documents to the Solr
 server [tests/solrclient_002.phpt] reason: Solr server is not set up
 TEST 3/21 [tests/solrclient_003.phpt]
 SKIP SolrClient::addDocuments() - sending a cloned document
 [tests/solrclient_003.phpt] reason: Solr server is not set up
 TEST 4/21 [tests/solrclient_004.phpt]
 SKIP SolrClient::query() - Sending a chained query request
 [tests/solrclient_004.phpt] reason: Solr server is not set up
 --
 View this message in context:
 http://old.nabble.com/Sol-server-is-not-set-uptp26743824p26743824.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Hi Regan,

This is Israel, the author of the PHP extension.

There is nothing wrong with your Solr server, it is just a configuration
that you have to change in the test_config.php file before running the make
test command.

In the tests/test_config.php file you will have to change the value of *
SOLR_SERVER_CONFIGURED* from *false* to* true*.

You can the contents of the file here in the repository

http://svn.php.net/viewvc/pecl/solr/trunk/tests/test.config.php?revision=290120view=markup

You also have to specify the correct values for the host name and port
numbers.

I am going to make some changes to the README files, the test scripts other
documentations to make sure that this part is clear (why some tests may be
skipped). These changes should be available in the next update release early
next week.

So, please make these changes and try again. It should not be skipped this
time.

Also, I would like to know the version of the Solr extension, the PHP
version and the operating system you are using.

Please let me know if you need any help.

Sincerely,
Israel Ekpo

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: SolrClient::query(): Solr HTTP Error : 'Couldn't connect to server'

2009-12-11 Thread Israel Ekpo
On Fri, Dec 11, 2009 at 6:49 AM, regany re...@newzealand.co.nz wrote:


 hi, I've (hopefully correctly) install the solr php extension.

 But I'm receiving the following error when trying to run my test script:

 SolrClient::query(): Solr HTTP Error : 'Couldn't connect to server'

 Any ideas how to figure out why it's giving the error??

 regan


 ?php

 /* Domain name of the Solr server */
 define('SOLR_SERVER_HOSTNAME', 'localhost');

 define('SOLR_SERVER_PATH', '/solr/core0');

 /* Whether or not to run in secure mode */
 define('SOLR_SECURE', false );

 /* HTTP Port to connection */
 define('SOLR_SERVER_PORT', ((SOLR_SECURE) ? 8443 : 8983));

 $options = array(
'hostname' = SOLR_SERVER_HOSTNAME
,'port' = SOLR_SERVER_PORT
,'path' = SOLR_SERVER_PATH

 );

 $client = new SolrClient($options);
 $query = new SolrQuery();
 $query-setQuery('apple');
 $query-setStart(0);
 $query-setRows(50);
 $query_response = $client-Query($query);
 print_r($query_response);
 $respose = $query_response-getResponse();
 print_r($response);

 ?


 --
 View this message in context:
 http://old.nabble.com/SolrClient%3A%3Aquery%28%29%3A-Solr-HTTP-Error-%3A-%27Couldn%27t-connect-to-server%27-tp26742899p26742899.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Hi Regan,

I have the following questions:

0. What version of Apache Solr are you using? 1.3, 1.4, nightly builds?

1. What version of PHP are you using and on what operating system?

2. What version of the Solr extension are you using?

3. Which servlet container are you using for Solr? (Jetty, Tomcat, Glass
fish etc)

4. What is the hostname and port numbers and path to Solr? Is your port
number 8080 or 8983

All please let me know what the output of $client-getDebug() is. This
usually contains very detailed errors of what is happening during the
connection.

I would be happy to help you troubleshoot any errors you are having.


-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Indexing content on Windows file shares?

2009-12-10 Thread Israel Ekpo
If you are looking to index websites, Nutch would be a better alternative.

However, it could be useful for indexing text files.

There is documentation here for how to add data to the index

http://lucene.apache.org/solr/tutorial.html#Indexing+Data

http://wiki.apache.org/solr/#Search_and_Indexing

There are some clients here to add data to the index programatically.

http://wiki.apache.org/solr/IntegratingSolr



On Thu, Dec 10, 2009 at 3:06 PM, Matt Wilkie matt.wil...@gov.yk.ca wrote:

 Hello,

 I'm new to Solr, I know nothing about it other than it's been touted in a
 couple of places as a possible competitor to Google Search Appliance, which
 is what brought me here. I'm looking for a search engine which can index
 files on windows shares and websites, and, hopefully, integrate with Active
 Directory to ensure results are not returned to users who don't have access
 to those files(s).

 Can Solr do this? If so where is the documentation for it? Reconnaisance
 searches of the mailing list and wiki have not turned up anything, so far.

 thanks,

 --
 matt wilkie
 
 Geomatics Analyst
 Information Management and Technology
 Yukon Department of Environment
 10 Burns Road * Whitehorse, Yukon * Y1A 4Y9
 867-667-8133 Tel * 867-393-7003 Fax
 http://environmentyukon.gov.yk.ca/geomatics/
 




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


SolrQuerySyntax : Types of Range Queries in Solr 1.4

2009-12-09 Thread Israel Ekpo
Hi Guys,

In Lucene 2.9 and Solr 1.4, it is possible to perform inclusive and
exclusive range searches with square and curly brackets respectively.

However, when I looked at the SolrQuerySyntax, only the the include range
search is illustrated.

It seems like the examples only talk about the inclusive range searches.

http://wiki.apache.org/solr/SolrQuerySyntax

Illustrative example:

There is a field in the index name 'year' and it contains the following
values :

2000, 2004, 2005, 2006, 2007, 2008, 2009, 2010

year:[2005 TO 2009] will match 2005, 2006, 2007, 2008, 2009 [inclusive with
square brackets]
year:{2005 TO 2009} will only match 2006, 2007, 2008 {exclusive with curly
brackets}. The bounds are not included.

Is there any other page on the wiki where there are examples of exclusive
range searches with curly brackets?

If not I would like to know so that I can add some examples to the wiki.

Thanks.

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: SolrQuerySyntax : Types of Range Queries in Solr 1.4

2009-12-09 Thread Israel Ekpo
On Wed, Dec 9, 2009 at 1:13 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 Solr standard query syntax is an extension of Lucene query syntax, and
 we reference that on the page:
 http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

 -Yonik
 http://www.lucidimagination.com

 On Wed, Dec 9, 2009 at 1:08 PM, Israel Ekpo israele...@gmail.com wrote:
  Hi Guys,
 
  In Lucene 2.9 and Solr 1.4, it is possible to perform inclusive and
  exclusive range searches with square and curly brackets respectively.
 
  However, when I looked at the SolrQuerySyntax, only the the include range
  search is illustrated.
 
  It seems like the examples only talk about the inclusive range searches.
 
  http://wiki.apache.org/solr/SolrQuerySyntax
 
  Illustrative example:
 
  There is a field in the index name 'year' and it contains the following
  values :
 
  2000, 2004, 2005, 2006, 2007, 2008, 2009, 2010
 
  year:[2005 TO 2009] will match 2005, 2006, 2007, 2008, 2009 [inclusive
 with
  square brackets]
  year:{2005 TO 2009} will only match 2006, 2007, 2008 {exclusive with
 curly
  brackets}. The bounds are not included.
 
  Is there any other page on the wiki where there are examples of exclusive
  range searches with curly brackets?
 
  If not I would like to know so that I can add some examples to the wiki.
 
  Thanks.
 
  --
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the gift.
  Quality First. Measure Twice. Cut Once.
  http://www.israelekpo.com/
 



Hi Yonik,

I saw that.

I posted the question because someone asked me how to do the exclusive
search where the bounds are excluded.

Initially they started with field:[lower-1 TO upper-1] and then I just told
them to use curly brackets so when I came to the Solr wiki to do a search I
did not see any examples with the curly brackets.

For me this was very obvious, but I think it would be nice to add a few
examples with curly brackets to the SolrQuerySyntax examples because most
people that are using Solr for the very first time may not have heard of or
used Lucene before.

Just a thought.

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: parsing the raw query string?

2009-12-06 Thread Israel Ekpo
Hi

If you are planning to use Solr via PHP, you can take a look at the Solr
PECL extension.

http://www.php.net/manual/en/book.solr.php

which you can download from here

http://pecl.php.net/package/solr

There is a SolrQuery class that allows you to build and manage the
name-value pair parameters which you can then pass on to the SolrClient
object for onward transmission to the Solr server. It is also serializable
so you can cache is in the $_SESSION variable to propagate the parameters
from page to page accross requests.

The SolrQuery class has buillt-in methods to add, update, remove and manage
the Facets, Highlighting, MoreLikeThis, Stats, TermsComponents etc.

I hope this helps.

On Sun, Dec 6, 2009 at 1:25 AM, regany re...@newzealand.co.nz wrote:


 I've just found solr and am looking at what's involved to work with it. All
 the examples I've seen only ever use 1 word search terms being implemented
 as examples, which doesn't help me trying to see how multiple word queries
 work. It also looks like a hell of a lot of processing needs to be done on
 the raw query string even before you can pass it to solr (in PHP) - is
 everyone processing the query string first and building a custom call to
 solr, or is there a query string parser I've missed somewhere? I can't even
 find what operators (if any) are able to be used in the raw query string in
 the online docs (maybe there aren't any??). Any help or points in the right
 direction would be appreciated.
 --
 View this message in context:
 http://old.nabble.com/parsing-the-raw-query-string--tp26662578p26662578.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Stopping Starting

2009-12-03 Thread Israel Ekpo
On Thu, Dec 3, 2009 at 5:01 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Thu, Dec 3, 2009 at 4:57 PM, Lee Smith l...@weblee.co.uk wrote:
  Hello All
 
  I am just starting out today with solr and looking for some advice but I
  first have a problem.
 
  I ran the start command ie.
 
  user:~/solr/example$ java -jar start.jar
 
  Which worked perfect and started to explore the interface.
 
  But my terminal window dropped and I it has stopped working. If i try and
  restart it Im getting errors and its still not working.
 
  error like:
  2009-12-03 21:55:41.785::WARN:  EXCEPTION
  java.net.BindException: Address already in use
 
  So how can I stop and restart the service ?

 Try and find the java process and kill it?
 ps -elf | grep java
 kill pid

 If no other Java processes are running under user, then killall
 java is a quick way to do it (Linux has killall... not sure about
 other systems).

 -Yonik
 http://www.lucidimagination.com



On Ubuntu, CentOS and some other linux distros, you can run this command:

pkill -f start.jar

OR

pkill -f java

If there are no other java processes running


-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


[PECL-DEV] [ANNOUNCEMENT] solr-0.9.8 (beta) Released

2009-12-03 Thread Israel Ekpo
The new PECL package solr-0.9.8 (beta) has been released at
http://pecl.php.net/.

Release notes
-
- Fixed config.w32 for Windows build support. (Pierre, Pierrick)
- Windows .dll now available at http://downloads.php.net/pierre (Pierre)
- Fixed Bug #16943 Segmentation Fault from solr_encode_string() during
attempt to retrieve solrXmlNode-children-content when
solrXmlNode-children is NULL (Israel)
- Disabled Expect header in libcurl (Israel)
- Disabled Memory Debugging when normal debug is enabled (Israel)
- Added list of contributors to the project (README.CONTRIBUTORS)

Package Info
-
It effectively simplifies the process of interacting with Apache Solr using
PHP5 and it already comes with built-in readiness for the latest features
available in Solr 1.4. The extension has features such as built-in,
serializable query string builder objects which effectively simplifies the
manipulation of name-value pair request parameters across repeated requests.
The response from the Solr server is also automatically parsed into native
php objects whose properties can be accessed as array keys or object
properties without any additional configuration on the client-side. Its
advanced HTTP client reuses the same connection across multiple requests and
provides built-in support for connecting to Solr servers secured behind HTTP
Authentication or HTTP proxy servers. It is also able to connect to
SSL-enabled containers. Please consult the documentation for more details on
features.

Related Links
-
Package home: http://pecl.php.net/package/solr
Changelog: http://pecl.php.net/package-changelog.php?package=solr
Download: http://pecl.php.net/get/solr-0.9.8.tgz
Documentation: http://www.php.net/manual/en/book.solr.php

Authors
-
Israel Ekpo ie...@php.net (lead)

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Announcing the Apache Solr extension in PHP - 0.9.0

2009-11-23 Thread Israel Ekpo
Hi Mike,

Thanks to Pierre, the Windows version of the extension are available here
compiled from trunk  r 291135

http://downloads.php.net/pierre/

I am planning to have 0.9.8 compiled for windows as soon as it is out
sometime later this week.

The 1.0 release should be out sometime before mid December after the API is
finalized and tested.

You can always check the project home page for news about upcoming releases

http://pecl.php.net/package/solr

The documentation is available here
http://www.php.net/manual/en/book.solr.php

Cheers


On Mon, Nov 23, 2009 at 3:28 PM, Michael Lugassy mlu...@gmail.com wrote:

 Thanks Israel, exactly what I was looking for, but how would one get a
 pre-compiled dll for windows? using PHP 5.3 VS9 TS.

 On Mon, Oct 5, 2009 at 7:03 AM, Israel Ekpo israele...@gmail.com wrote:
  Fellow Apache Solr users,
 
  I have been working on a PHP extension for Apache Solr in C for quite
  sometime now.
 
  I just finished testing it and I have completed the initial user level
  documentation of the API
 
  Version 0.9.0-beta has just been released.
 
  It already has built-in readiness for Solr 1.4
 
  If you are using Solr 1.3 or later in PHP, I would appreciate if you
 could
  check it out and give me some feedback.
 
  It is very easy to install on UNIX systems. I am still working on the
 build
  for windows. It should be available for Windows soon.
 
  http://solr.israelekpo.com/manual/en/solr.installation.php
 
  A quick list of some of the features of the API include :
  - Built in serialization of Solr Parameter objects.
  - Reuse of HTTP connections across repeated requests.
  - Ability to obtain input documents for possible resubmission from query
  responses.
  - Simplified interface to access server response data (SolrObject)
  - Ability to connect to Solr server instances secured behind HTTP
  Authentication and proxy servers
 
  The following components are also supported
  - Facets
  - MoreLikeThis
  - TermsComponent
  - Stats
  - Highlighting
 
  Solr PECL Extension Homepage
  http://pecl.php.net/package/solr
 
  Some examples are available here
  http://solr.israelekpo.com/manual/en/solr.examples.php
 
  Interim Documentation Page until refresh of official PHP documentation
  http://solr.israelekpo.com/manual/en/book.solr.php
 
  The C source is available here
  http://svn.php.net/viewvc/pecl/solr/
 
  --
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the gift.
  Quality First. Measure Twice. Cut Once.
 




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


[ANNOUNCEMENT] solr-0.9.7 (beta) Released

2009-11-17 Thread Israel Ekpo
The new PECL package solr-0.9.7 (beta) has been released at
http://pecl.php.net/.

Release notes
-
- Fixed bug 16924 AC_MSG_NOTICE() is undefined in autoconf 2.13
- Added new method SolrClient::getDebug()
- Modified SolrClient::__construct() so that port numbers and other integer
values for the options can be passed as strings.
- Changed internal string handling mechanism to allow for tracking of memory
allocation in debug mode.
- Lowered minimum php version to 5.2.3. Unfortunately, this is the lowest
PHP version that will be supported. PHP versions lower than 5.2.3 are not
compatible or are causing tests to FAIL.
- Added php stubs for code-completion assists in IDEs and editors.
- Added more examples

Package Info
-
It effectively simplifies the process of interacting with Apache Solr using
PHP5 and it already comes with built-in readiness for the latest features
available in Solr 1.4. The extension has features such as built-in,
serializable query string builder objects which effectively simplifies the
manipulation of name-value pair request parameters across repeated requests.
The response from the Solr server is also automatically parsed into native
php objects whose properties can be accessed as array keys or object
properties without any additional configuration on the client-side. Its
advanced HTTP client reuses the same connection across multiple requests and
provides built-in support for connecting to Solr servers secured behind HTTP
Authentication or HTTP proxy servers. It is also able to connect to
SSL-enabled containers. Please consult the documentation for more details on
features.

Related Links
-
Package home: http://pecl.php.net/package/solr
Changelog: http://pecl.php.net/package-changelog.php?package=solr
Download: http://pecl.php.net/get/solr-0.9.7.tgz

Authors
-
Israel Ekpo ie...@php.net (lead)


Re: Solr 1.3 query and index perf tank during optimize

2009-11-17 Thread Israel Ekpo
On Tue, Nov 17, 2009 at 2:24 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Basically, search entries are keyed to other documents.  We have finite
 : storage,
 : so we purge old documents.  My understanding was that deleted documents
 : still
 : take space until an optimize is done.  Therefore, if I don't optimize,
 the
 : index
 : size on disk will grow without bound.
 :
 : Am I mistaken?  If I don't ever have to optimize, it would make my life
 : easier.

 deletions are purged as segments get merged.  if you want to force
 deleted documents to be purged, the only way to do that at the
 moment is to optimize (which merges all segments).  but if you are
 continually deleteing/adding documents, the deletions will eventaully get
 purged even if you never optimize.




 -Hoss



Chris,

Since the mergeFactor controls the segment merge frequency and size and the
number of segments is limited to mergeFactor - 1.

Would one be correct to state that if some documents have been deleted from
the index and the changes finalized with a call to commit, as more documents
are added to the index, eventually the index will be  implicitly *optimized
* and the deleted documents will be purged even without explicitly issuing
an optimize statement?


-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: PhP, Solr and Delta Imports

2009-11-16 Thread Israel Ekpo
On Mon, Nov 16, 2009 at 2:49 PM, Pablo Ferrari pabs.ferr...@gmail.comwrote:

 Hello,

 I have an already working Solr service based un full imports connected via
 php to a Zend Framework MVC (I connect it directly to the Controller).
 I use the SolrClient class for php which is great:
 http://www.php.net/manual/en/class.solrclient.php

 For now on, every time I want to edit a document I have to do a full import
 again or I can delete the document by its id and add it again with the
 updated info...
 Anyone can guide me a bit in how to do delta imports? If its via php,
 better!

 Thanks in advance,

 Pablo Ferrari
 Tinkerlabs.net



Hello Pablo,

You have a couple of options and you do not have to do a full data re-import
for the entire index.

My example below uses 'doc_id' as the uniqueKey field in your schema. It
also assumes that it is an integer type

1. You can remove the document from the index by query or by id (assuming
you have its id or uniqueKey field) if you want to just take it out of the
active index.

$client = new SolrClient($options);

$client-deleteById(400); // I recommend this one

OR

$client-deleteByQuery('doc_id:400'); // This should work too.

2. If all you want to do is to replace/update an existing document in the
Solr index and you still want the document to remain active in the index
then you can just update it by building a SolrInputDocument object and then
submitting just that document using the SolrClient.

$client = new SolrClient($options);

$doc = new SolrInputDocument();

$doc-addField('doc_id', 334455);
$doc-addField('other_field', 'Other Field Value');
$doc-addField('another_field', 'Another Field Value');

$updateResponse = $client-addDocument($doc);

If your changes are coming from the db it would be helpful to have a time
stamp column that changes each time the record is modified.

Then you can keep track of when the last index process was done and the next
time you can retrieve only 'active' documents that have been modified or
created after this last re-index process. You can send the
SolrInputDocuments to the Solr Index using the SolrClient object as shown
above for each document.

Do not forget to save the changes to the index with a call to
SolrClient::commit()

If you are updating a lot of records, I would remmend waiting till the end
to do the commit (and optimize call if needed).

More examples are available here

http://us2.php.net/manual/en/solr.examples.php

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Solr - Load Increasing.

2009-11-16 Thread Israel Ekpo
On Mon, Nov 16, 2009 at 5:22 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Probably lakh: 100,000.

 So, 900k qpd and 3M docs.

 http://en.wikipedia.org/wiki/Lakh

 wunder

 On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote:

  Hi,
 
  Your autoCommit settings are very aggressive.  I'm guessing that's what's
 causing the CPU load.
 
  btw. what is laks?
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
  From: kalidoss kalidoss.muthuramalin...@sifycorp.com
  To: solr-user@lucene.apache.org
  Sent: Mon, November 16, 2009 9:11:21 AM
  Subject: Solr - Load Increasing.
 
  Hi All.
 
My server solr box cpu utilization  increasing b/w 60 to 90% and some
 time
  solr is getting down and we are restarting it manually.
 
No of documents in solr 30 laks.
No of add/update requrest solr 30 thousand / day. Avg of every 30
 minutes
  around 500 writes.
No of search request 9laks / day.
Size of the data directory: 4gb.
 
 
My system ram is 8gb.
System available space 12gb.
processor Family: Pentium Pro
 
Our solr data size can be increase in number like 90 laks. and writes
 per day
  will be around 1laks.   - Hope its possible by solr.
 
For write commit i have configured like
 
   1
   10
 
 
Is all above can be possible? 90laks datas and 1laks per day writes
 and
  30laks per day read??  - if yes what type of system configuration would
 require.
 
Please suggest us.
 
  thanks,
  Kalidoss.m,
 
 
  Get your world in your inbox!
 
  Mail, widgets, documents, spreadsheets, organizer and much more with
 your
  Sifymail WIYI id!
  Log on to http://www.sify.com
 
  ** DISCLAIMER **
  Information contained and transmitted by this E-MAIL is proprietary to
 Sify
  Limited and is intended for use only by the individual or entity to
 which it is
  addressed, and may contain information that is privileged, confidential
 or
  exempt from disclosure under applicable law. If this is a forwarded
 message, the
  content of this E-MAIL may not have been sent with the authority of the
 Company.
  If you are not the intended recipient, an agent of the intended
 recipient or a
  person responsible for delivering the information to the named
 recipient,  you
  are notified that any use, distribution, transmission, printing, copying
 or
  dissemination of this information in any way or in any manner is
 strictly
  prohibited. If you have received this communication in error, please
 delete this
  mail  notify us immediately at ad...@sifycorp.com
 




Thanks Walter for clarifying that.

I too was wondering what laks meant.

It was a bit distracting when I read the original post.
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Newbie tips: migrating from mysql fulltext search / PHP integration

2009-11-15 Thread Israel Ekpo
On Mon, Nov 16, 2009 at 12:34 AM, Mattmann, Chris A (388J) 
chris.a.mattm...@jpl.nasa.gov wrote:

 WOW, +1!! Great job, PHP!

 Cheers,
 Chris



 On 11/15/09 10:13 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Hi,

 I'm not sure if you have a specific question there.
 But regarding PHP integration part, I just learned PHP now has native
 Solr (1.3 and 1.4) support:

  http://twitter.com/otisg/status/5757184282


 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
  From: mbneto mbn...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Sun, November 15, 2009 4:56:15 PM
  Subject: Newbie tips: migrating from mysql fulltext search / PHP
 integration
 
  Hi,
 
  I am looking for alternatives to MySQL fulltext searches.  The combo
  Lucene/Solr is one of my options and I'd like to gather as much
 information
  I can before choosing and even build a prototype.
 
  My current need does not seem to be different.
 
  - fast response time (currently some searches can take more than 11sec)
  - API to add/update/delete documents to the collection
  - way to add synonymous or similar words for misspelled ones (ex. Sony =
  Soni)
  - way to define relevance of results (ex. If I search for LCD return
  products that belong to the LCD category, contains LCD in the product
  definition or ara marked as special offer)
 
  I know that I may have to add external code, for example, to take the
  results and apply some business logic to resort the results but I'd like
 to
  know, besides the wiki and the solr 1.4 Enterprise Seacrh Server book
 (which
  I am considering to buy) the tips for solr usage.



 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.mattm...@jpl.nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/http://sunset.usc.edu/%7Emattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++




Hi,

There is native support for Solr in PHP but currently you have to build it
as a PECL extension.

It is currently not bundled with the PHP source yet but it is down loadable
from the PECL project homepage

http://pecl.php.net/package/solr

If you currently have pecl support built into your php installation you can
install it by running the following command

pecl install solr-beta

Some usage examples are available here

http://us3.php.net/manual/en/solr.examples.php

More details are available here

http://www.php.net/manual/en/book.solr.php

I use Solr with PHP 5.2

- In PHP, the SolrClient class has methods to add, update, delete and
rollback changes to the index made since the last commit.
- There are also built-in tools in Solr that allow you to analyze and modify
the data before indexing it and when searching for it.
- with Solr you can define synonyms (check the wiki for more details)
- Solr also allows you to sort by score (relevance)
- You can specify the fields that you want either as (optional, required or
prohibited)

My last two points could take care of your last requirement.

Solr is awesome and most of the search I perform return sub-second response
times.

Its several hundred folds easier and more efficient than MySQL fulltext.
believe me.
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Are subqueries possible in Solr? If so, are they performant?

2009-11-12 Thread Israel Ekpo
On Thu, Nov 12, 2009 at 3:39 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I am getting results from one query and I just need 2 index attribute
 values
 : . These index attribute values are used for form new Query to Solr.

 can you elaborate on what exactly you mean by These index attribute
 values are used for form new Query to Solr ... are you saying that you
 want to take the values from *every* document matching query#1 and use
 them to construct query#2

 this sounds like you arent' denormalizing your data enough when building
 your index.

 : Since Solr gives result only for GET request, hence there is restriction
 on
 : : forming query with all values.

 that's false ... you can post a query if you want, and there are not hard
 constraints on how big a query can be (just practical constraints on what
 your physical hardware can handle in a reasonable amount of time)

 :   SELECT id, first_name
 :   FROM student_details
 :   WHERE first_name IN (SELECT first_name
 :   FROM student_details
 :   WHERE subject= 'Science');
 :  
 :   If so, how performant is this kind of queries?

 even as a sql query this doesn't relaly make much sense to me (at least
 not w/o a better understanding of the table+data)

 why wouldn't you just say:

SELECT id, first_name FROM ...WHERE subject='Science'

 ..or in Solr...

q=subject:Sciencefl=id,first_name



 -Hoss


It's also important to note that the Solr schema contains only one table,
so to speak; whereas in the traditional database schema you can have more
than one table in the same schema where you can do JOINs and sub queries
across multiple tables to retrieve the target data.

If you are bringing data from multiple database tables into the Solr index,
they have to be denormalized to fit into just one table in Solr.

So you will have to use a BOOLEAN AND or a filter query to simulate the sub
query you are trying to make.

I hope this clears things a bit.
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Commit error

2009-11-11 Thread Israel Ekpo
2009/11/11 Licinio Fernández Maurelo licinio.fernan...@gmail.com

 Hi folks,

 i'm getting this error while committing after a dataimport of only 12 docs
 !!!

 Exception while solr commit.
 java.io.IOException: background merge hit exception: _3kta:C2329239
 _3ktb:c11-_3ktb into _3ktc [optimize] [mergeDocStores]
 at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2829)
 at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2750)
 at

 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401)
 at

 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
 at

 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:138)
 at

 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:66)
 at
 org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:170)
 at
 org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:208)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
 at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
 at

 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
 at

 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
 Caused by: java.io.IOException: No hay espacio libre en el dispositivo
 at java.io.RandomAccessFile.writeBytes(Native Method)
 at java.io.RandomAccessFile.write(RandomAccessFile.java:499)
 at

 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:191)
 at

 org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
 at

 org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
 at

 org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:75)
 at org.apache.lucene.store.IndexOutput.writeBytes(IndexOutput.java:45)
 at

 org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:229)
 at

 org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:184)
 at

 org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:217)
 at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5089)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4589)
 at

 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
 at

 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)

 Index info: 2.600.000 docs | 11G size
 System info: 15GB free disk space

 When attempting to commit the disk usage increases until solr breaks ... it
 looks like 15 GB is not enought space to do the merge | optimize

 Any advice?

 --
 Lici



Hi Licinio,

During the the optimization process, the index size would be approximately
double what it was originally and the remaining space on disk may not be
enough for the task.

You are describing exactly what could be going on
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-10 Thread Israel Ekpo
On Tue, Nov 10, 2009 at 8:26 AM, Eugene Dzhurinsky b...@redwerk.com wrote:

 On Tue, Nov 03, 2009 at 05:49:23PM -0800, Lance Norskog wrote:
  The DIH has improved a great deal from Solr 1.3 to 1.4. You will be
  much better off using the DIH from this.
 
  This is the current Solr release candidate binary:
  http://people.apache.org/~gsingers/solr/1.4.0/http://people.apache.org/%7Egsingers/solr/1.4.0/

 In fact we are prohibited to use release candidates/nightly builds, we are
 forced to use only releases of Solr :(

 --
 Eugene N Dzhurinsky



Well, the official release is out and you can pick it up from your closest
mirror here

http://www.apache.org/dyn/closer.cgi/lucene/solr/


-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Representing a complex schema in solr

2009-11-07 Thread Israel Ekpo
On Sat, Nov 7, 2009 at 11:37 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,

 i have a complex schema as shown below:

 Book
-  Title
-  Category
-  Publication
-  Edition
-  Publish Date
-  Author (multivalued)   = Author is a multivalued field containing
 the following attributes.
-  Name
-  Age
-  Location
-  Gender
- Qualification


 i wanna store the above information in solr so that i can query in every
 aspect

 one small query example would be:
 1. search for all the books written by females.
 2. search for all books writen by young authors...for example between the
 age 22 to 30.

 i woudn't wanna use RDBMS coz i have more than one million documents like
 this.

 i also tried saving the author as a JSON string. but then i cannot use wild
 card and range queries on it.

 any suggessions how wud i represent something like this in solr??

 Regards,
 Raakhi



Hi Rakhi,

I think you should do this to simplify your storage and retrieval process:

Instead of having one multi-valued author field, store each attribute as a
separate multi-valued field.

So name, age, location, gender and qualification will be separate fields in
the schema.

This will allow you to query the way you are asking

q=gender:female

OR by age

q=age:[22 TO 30]

Use tint (solr.TrieIntField) for the age field (if you are using Solr 1.4)

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: how to use ajax-solr - example?

2009-11-04 Thread Israel Ekpo
On Wed, Nov 4, 2009 at 10:48 AM, Joel Nylund jnyl...@yahoo.com wrote:

 Hi, I looked at the documentation and I have no idea how to get started?
 Can someone point me to or show me an example of how to send a query to a
 solr server and paginate through the results using ajax-solr.

 I would glady write a blog tutorial on how to do this if someone can get me
 started.

 I dont know jquery but have used prototype  scriptaculous.

 thanks
 Joel



Joel,

It will be best if you use a scripting language between Solr and JavaScript

This is becasue sending data only between JavaScript and Solr will limit you
to only one domain name.

However, if you are using a scripting language between JavaScript and Solr
you can use the scripting language to retrieve the request parameters from
JavaScript and then same them to Solr with the response writer set to json.

This will cause Solr to send the response in JSON format which the scripting
language can pass on to JavaScript.

This example here will cause Solr to return the response in JSON.

http://example.com:8443/solr/select?q=searchkeywordwt=json


-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: How to integrate Solr into my project

2009-11-03 Thread Israel Ekpo
2009/11/3 Licinio Fernández Maurelo licinio.fernan...@gmail.com

 Hi Caroline,

 i think that you must take an overview tour ;-) , solrj is just a solr java
 client ...

 Some clues:


   - Define your own index schema
 http://wiki.apache.org/solr/SchemaXml(it's just like a SQL DDL) .
   - There are different ways to put docs in your index:
  - SolrJ (Solr client for java env)
  - DIH http://wiki.apache.org/solr/DataImportHandler (Data Import
  Handler) this one is prefered when doing a huge data import from
 DB's, many
  source formats are supported.
   - Try to perform queries over your fancy-new index ;-). Learn about
   searching syntax and
 facetinghttp://wiki.apache.org/solr/SolrFacetingOverview
   .






 2009/11/3 Caroline Tan caroline@gmail.com

  Ya, it's a Java projecti just browse this site you suggested...
  http://wiki.apache.org/solr/Solrj
 
  Which means, i declared the dependancy to solr-solrj and solr-core jars,
  have those jars added to my project lib and by following the Solrj
  tutorial,
  i will be able to even index a DB table into Solr as well? thanks
 
  ~caroLine
 
 
  2009/11/3 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
   is it a java project ?
   did you see this page http://wiki.apache.org/solr/Solrj ?
  
   On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan caroline@gmail.com
   wrote:
Hi,
I wish to intergrate Solr into my current working project. I've
 played
around the Solr example and get it started in my tomcat. But the next
   step
is HOW do i integrate that into my working project? You see, Lucence
provides API and tutorial on what class i need to instanstiate in
 order
   to
index and search. But Solr seems to be pretty vague on this..as it is
 a
working solr search server. Can anybody help me by stating the steps
 by
steps, what classes that i should look into in order to assimiliate
  Solr
into my project?
Thanks.
   
regards
~caroLine
   
  
  
  
   --
   -
   Noble Paul | Principal Engineer| AOL | http://aol.com
  
 



 --
 Lici



I would also recommend buying the Solr 1.4 Enterprise Search Server.

It will give you some tips

http://www.amazon.com/Solr-1-4-Enterprise-Search-Server/dp/1847195881/ref=sr_1_1?ie=UTF8s=booksqid=1257247932sr=1-1
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: tracking solr response time

2009-11-02 Thread Israel Ekpo
On Mon, Nov 2, 2009 at 8:41 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Mon, Nov 2, 2009 at 8:13 AM, bharath venkatesh
 bharathv6.proj...@gmail.com wrote:
 We are using solr for many of ur products  it is doing quite well
  .  But since no of hits are becoming high we are experiencing latency
  in certain requests ,about 15% of our requests are suffering a latency

 How much of a latency compared to normal, and what version of Solr are
 you using?

   . We are trying to identify  the problem .  It may be due to  network
  issue or solr server is taking time to process the request  .   other
  than  qtime which is returned along with the response is there any
  other way to track solr servers performance ?
  how is qtime calculated
  , is it the total time from when solr server got the request till it
  gave the response ?

 QTime is the time spent in generating the in-memory representation for
 the response before the response writer starts streaming it back in
 whatever format was requested.  The stored fields of returned
 documents are also loaded at this point (to enable handling of huge
 response lists w/o storing all in memory).

 There are normally servlet container logs that can be configured to
 spit out the real total request time.

  can we do some extra logging to track solr servers
  performance . ideally I would want to pass some log id along with the
  request (query ) to  solr server  and solr server must log the
  response time along with that log id .

 Yep - Solr isn't bothered by params it doesn't know about, so just put
 logid=xxx and it should also be logged with the other request
 params.

 -Yonik
 http://www.lucidimagination.com




If you are not using Java then you may have to track the elapsed time
manually.

If you are using the SolrJ Java client you may have the following options:

There is a method called getElapsedTime() in
org.apache.solr.client.solrj.response.SolrResponseBase which is available to
all the subclasses

I have not used it personally but I think this should return the time spent
on the client side for that request.

The QTime is not the time on the client side but the time spent internally
at the Solr server to process the request.

http://lucene.apache.org/solr//api/solrj/org/apache/solr/client/solrj/response/SolrResponseBase.html

http://lucene.apache.org/solr//api/solrj/org/apache/solr/client/solrj/response/QueryResponse.html

Most likely it could be as a result of an internal network issue between the
two servers or the Solr server is competing with other applications for
resources.

What operating system is the Solr server running on? Is you client
application connection to a Solr server on the same network or over the
internet? Are there other applications like database servers etc running on
the same machine? If so, then the DB server (or any other application) and
the Solr server could be competing for resources like CPU, memory etc.

If you are using Tomcat, you can take a look in
$CATALINA_HOME/logs/catalina.out, there are timestamps there that can also
guide you.

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: tracking solr response time

2009-11-02 Thread Israel Ekpo
On Mon, Nov 2, 2009 at 9:52 AM, bharath venkatesh 
bharathv6.proj...@gmail.com wrote:

 Thanks for the quick response
 @yonik

 How much of a latency compared to normal, and what version of Solr are
 you using?

 latency is usually around 2-4 secs (some times it goes more than that
 )  which happens  to  only 15-20%  of the request  other  80-85% of
 request are very fast it is in  milli secs ( around 200,000 requests
 happens every day )

 @Israel  we are not using java client ..  we  r using  python at the
 client with response formatted in json

 @yonikn @Israel   does qtime measure the total time taken at the solr
 server ? I am already measuring the time to get the response  at
 client  end . I would want  a means to know how much time the solr
 server is taking to respond (process ) once it gets the request  . so
 that I could identify whether it is a solr server issue or internal
 network issue


It is the time spent at the Solr server.

I think Yonik already answered this part in his response to your thread :

This is what he said :

QTime is the time spent in generating the in-memory representation for
the response before the response writer starts streaming it back in
whatever format was requested.  The stored fields of returned
documents are also loaded at this point (to enable handling of huge
response lists w/o storing all in memory).



 @Israel  we are using rhel server  5 on both client and server .. we
 have 6 solr sever . one is acting as master . both client and solr
 sever are on the same network . those servers are dedicated solr
 server except 2 severs which have DB and memcahce running .. we have
 adjusted the load accordingly







 On 11/2/09, Israel Ekpo israele...@gmail.com wrote:
  On Mon, Nov 2, 2009 at 8:41 AM, Yonik Seeley
  yo...@lucidimagination.comwrote:
 
  On Mon, Nov 2, 2009 at 8:13 AM, bharath venkatesh
  bharathv6.proj...@gmail.com wrote:
  We are using solr for many of ur products  it is doing quite well
   .  But since no of hits are becoming high we are experiencing latency
   in certain requests ,about 15% of our requests are suffering a latency
 
  How much of a latency compared to normal, and what version of Solr are
  you using?
 
. We are trying to identify  the problem .  It may be due to  network
   issue or solr server is taking time to process the request  .   other
   than  qtime which is returned along with the response is there any
   other way to track solr servers performance ?
   how is qtime calculated
   , is it the total time from when solr server got the request till it
   gave the response ?
 
  QTime is the time spent in generating the in-memory representation for
  the response before the response writer starts streaming it back in
  whatever format was requested.  The stored fields of returned
  documents are also loaded at this point (to enable handling of huge
  response lists w/o storing all in memory).
 
  There are normally servlet container logs that can be configured to
  spit out the real total request time.
 
   can we do some extra logging to track solr servers
   performance . ideally I would want to pass some log id along with the
   request (query ) to  solr server  and solr server must log the
   response time along with that log id .
 
  Yep - Solr isn't bothered by params it doesn't know about, so just put
  logid=xxx and it should also be logged with the other request
  params.
 
  -Yonik
  http://www.lucidimagination.com
 
 
 
 
  If you are not using Java then you may have to track the elapsed time
  manually.
 
  If you are using the SolrJ Java client you may have the following
 options:
 
  There is a method called getElapsedTime() in
  org.apache.solr.client.solrj.response.SolrResponseBase which is available
 to
  all the subclasses
 
  I have not used it personally but I think this should return the time
 spent
  on the client side for that request.
 
  The QTime is not the time on the client side but the time spent
 internally
  at the Solr server to process the request.
 
 
 http://lucene.apache.org/solr//api/solrj/org/apache/solr/client/solrj/response/SolrResponseBase.html
 
 
 http://lucene.apache.org/solr//api/solrj/org/apache/solr/client/solrj/response/QueryResponse.html
 
  Most likely it could be as a result of an internal network issue between
 the
  two servers or the Solr server is competing with other applications for
  resources.
 
  What operating system is the Solr server running on? Is you client
  application connection to a Solr server on the same network or over the
  internet? Are there other applications like database servers etc running
 on
  the same machine? If so, then the DB server (or any other application)
 and
  the Solr server could be competing for resources like CPU, memory etc.
 
  If you are using Tomcat, you can take a look in
  $CATALINA_HOME/logs/catalina.out, there are timestamps there that can
 also
  guide you.
 
  --
  Good Enough is not good enough.
  To give

Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-10-30 Thread Israel Ekpo
On Fri, Oct 30, 2009 at 11:23 AM, Eugene Dzhurinsky b...@redwerk.comwrote:

 Hi there!

 We are trying to evaluate Apache Solr for our custom search implementation,
 which
 includes the following requirements:

 - ability to add/update/delete a lot of documents at once

 - ability to iterate over all documents, returned in search, as Lucene does
  provide within a HitCollector instance. We would need to extract and
  aggregate various fields, stored in index, to group results and aggregate
 them
  in some way.

 After reading the tutorial I've realized that adding and removal of
 documents
 is performed through passing an XML file to controller in POST request.
 However our XML files may be very, very large - so I hope there is some
 another option to avoid interaction through HTTP protocol.

 Also I did not find any way in the tutorial to access the search results
 with
 all fields to be processed by our application.

 I think I simply did not read the documentation well or missed some point,
 so
 can somebody please point me to the articles, which may explain basics of
 how
 to achieve my goals?

 Thank you very much in advance!

 --
 Eugene N Dzhurinsky


Hi Eugene

Solr has an embedded version but you are encouraged to use the standard web
service interfaces.

Also, the Solr 1.4 white paper just recently released talks about the the
Streaming Updates Solr Server which according to the white paper can index
documents at an incredibly lightening speed of up to 25K documents per
second.

The white paper can be downloaded here

http://www.lucidimagination.com/whitepaper/whats-new-in-solr-1-4

Info about Streaming Update Solr Server is available here

http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html

If you are still interested in the Embedded version to avoid the HTTP
version you can check out the following links

http://wiki.apache.org/solr/EmbeddedSolr

http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html

I hope this helps.

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Indexing multiple entities

2009-10-29 Thread Israel Ekpo
On Thu, Oct 29, 2009 at 3:31 PM, Christian López Espínola 
penyask...@gmail.com wrote:

 Hi, my name is Christian and I'm a newbie introducing to solr (and solrj).

 I'm working on a website where I want to index multiple entities, like
 Book or Magazine.
 The issue I'm facing is both of them have an attribute ID, which I
 want to use as the uniqueKey on my schema, so I cannot identify
 uniquely a document (because ID is saved in a database too, and it's
 autonumeric).

 I'm sure that this is a common pattern, but I don't find the way of solving
 it.

 How do you usually solve this? Thanks in advance.


 --
 Cheers,

 Christian López Espínola penyaskito


Hi Christian,

It looks like you are bringing in data to Solr from a database where there
are two separate tables.

One for *Books* and another one for *Magazines*.

If this is the case, you could define your uniqueKey element in Solr schema
to be a string instead of an integer then you can still load documents
from both the books and magazines database tables but your could prefix the
uniqueKey field with B for books and M for magazines

Like so :

field name=id type=string indexed=true stored=true
required=true/

uniqueKeyid/uniqueKey

Then when loading the books or magazines into Solr you can create the
documents with id fields like this

add
  doc
field name=idB14000/field
  /doc
  doc
field name=idM14000/field
  /doc
  doc
field name=idB14001/field
  /doc
  doc
field name=idM14001/field
  /doc
/add

I hope this helps
-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Version 0.9.3 of the PECL extension for solr has just been released

2009-10-19 Thread Israel Ekpo
Version 0.9.3 of the PECL extension for solr has just been released.

Some of the methods have been updated and more get* methods have been added
to the Query builder classes.

The user level documentation was also updated to make the installation
instructions a lot clearer.

The latest documentation and source code are available from the project home
page

http://pecl.php.net/package/solr

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Solr 1.4 Release Party

2009-10-12 Thread Israel Ekpo
It is my email signature.

It is a sort of hybrid/mashup from different sources.

On Mon, Oct 12, 2009 at 6:49 PM, Michael Masters mmast...@gmail.com wrote:

 Where does the quote come from :)

 On Sat, Oct 10, 2009 at 6:38 AM, Israel Ekpo israele...@gmail.com wrote:
  I can't wait...
 
  --
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the gift.
  Quality First. Measure Twice. Cut Once.
 




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Solr 1.4 Release Party

2009-10-10 Thread Israel Ekpo
I can't wait...

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Announcing the Apache Solr extension in PHP - 0.9.0

2009-10-04 Thread Israel Ekpo
Fellow Apache Solr users,

I have been working on a PHP extension for Apache Solr in C for quite
sometime now.

I just finished testing it and I have completed the initial user level
documentation of the API

Version 0.9.0-beta has just been released.

It already has built-in readiness for Solr 1.4

If you are using Solr 1.3 or later in PHP, I would appreciate if you could
check it out and give me some feedback.

It is very easy to install on UNIX systems. I am still working on the build
for windows. It should be available for Windows soon.

http://solr.israelekpo.com/manual/en/solr.installation.php

A quick list of some of the features of the API include :
- Built in serialization of Solr Parameter objects.
- Reuse of HTTP connections across repeated requests.
- Ability to obtain input documents for possible resubmission from query
responses.
- Simplified interface to access server response data (SolrObject)
- Ability to connect to Solr server instances secured behind HTTP
Authentication and proxy servers

The following components are also supported
- Facets
- MoreLikeThis
- TermsComponent
- Stats
- Highlighting

Solr PECL Extension Homepage
http://pecl.php.net/package/solr

Some examples are available here
http://solr.israelekpo.com/manual/en/solr.examples.php

Interim Documentation Page until refresh of official PHP documentation
http://solr.israelekpo.com/manual/en/book.solr.php

The C source is available here
http://svn.php.net/viewvc/pecl/solr/

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Quotes in query string cause NullPointerException

2009-10-01 Thread Israel Ekpo
Don't be too hard on yourself.

Sometimes, mistakes like that can happen even to the most brilliant and most
experienced.

On Thu, Oct 1, 2009 at 2:15 PM, Andrew Clegg andrew.cl...@gmail.com wrote:


 Sorry! I'm officially a complete idiot.

 Personally I'd try to catch things like that and rethrow a
 'QueryParseException' or something -- but don't feel under any obligation
 to
 listen to me because, well, I'm an idiot.

 Thanks :-)

 Andrew.


 Erik Hatcher-4 wrote:
 
  don't forget q=...  :)
 
Erik
 
  On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote:
 
 
  Hi folks,
 
  I'm using the 2009-09-30 build, and any single or double quotes in
  the query
  string cause an NPE. Is this normal behaviour? I never tried it with
  my
  previous installation.
 
  Example:
 
  http://myserver:8080/solr/select/?title:%22Creatine+kinase%22
 
  (I've also tried without the URL encoding, no difference)
 
  Response:
 
  HTTP Status 500 - null java.lang.NullPointerException at
  java.io.StringReader.init(StringReader.java:33) at
  org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:
  173) at
  org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:
  78) at
  org.apache.solr.search.QParser.getQuery(QParser.java:131) at
  org
  .apache
  .solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
  at
  org
  .apache
  .solr
  .handler
  .component.SearchHandler.handleRequestBody(SearchHandler.java:174)
  at
  org
  .apache
  .solr
  .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
  org
  .apache
  .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at
  org
  .apache
  .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at
  org
  .apache
  .catalina
  .core
  .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:
  235)
  at
  org
  .apache
  .catalina
  .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
  org
  .apache
  .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
  233)
  at
  org
  .apache
  .catalina.core.StandardContextValve.invoke(StandardContextValve.java:
  175)
  at
  org
  .apache
  .catalina.valves.RequestFilterValve.process(RequestFilterValve.java:
  269)
  at
  org
  .apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java:
  81)
  at
  org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:
  568)
  at
  org
  .apache
  .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
  at
  org
  .jstripe
  .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
  at
  org
  .jstripe
  .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
  at
  org
  .jstripe
  .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
  at
  org
  .jstripe
  .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
  at
  org
  .apache
  .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
  at
  org
  .apache
  .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:
  109)
  at
  org
  .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
  286)
  at
  org
  .apache.coyote.http11.Http11Processor.process(Http11Processor.java:
  844)
  at
  org.apache.coyote.http11.Http11Protocol
  $Http11ConnectionHandler.process(Http11Protocol.java:583)
  at org.apache.tomcat.util.net.JIoEndpoint
  $Worker.run(JIoEndpoint.java:447)
  at java.lang.Thread.run(Thread.java:619)
 
  Single quotes have the same effect.
 
  Is there another way to specify exact phrases?
 
  Thanks,
 
  Andrew.
 
  --
  View this message in context:
 
 http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25704050.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


XSD for Solr Response Format Version 2.2

2009-09-21 Thread Israel Ekpo
I am working on an XSD document for all the types in the response xml
version 2.2

Do you think there is a need for this?

-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Limit number of docs that can be indexed (security)

2009-09-21 Thread Israel Ekpo
Valdir,

I think you are making it more complicated that it needs to be.

As the administrator, if you don't want them to modify the contents of the
solrconfig.xml file then you should not give them access to do so.

If they already have access to change the contents of the file, you can
revoke such privileges.

That should do it. The users should only work on the client side (adding
documents, sending queries)

On Mon, Sep 21, 2009 at 6:14 PM, Valdir Salgueiro sombraex...@gmail.comwrote:

 Hello,

 I need a way to limit the number of documents that can be indexed on my
 solr-based application. Here is what I have come up with: create a *
 UpdateRequestProcessor* and register it on *solrconfig.xml*. When the user
 tries to add a document, check if the docs limit has been reached. The
 problem is, the user can modify solrconfig.xml and remove the *
 UpdateRequestProcessor* so he can index as much as he wants.

 Any ideas how to implement such restriction in a safer manner?

 Thanks in advance,
 Valdir

 PS: Of course, I also need to make sure the user cannot modify how many
 files he can index, but I think some encription on the properties file
 which
 holds that information will do for now.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: When to use Solr over Lucene

2009-09-16 Thread Israel Ekpo
Comparing Solr to Lucene is not exactly an apples-to-apples comparison.

Solr is a superset of Lucene. It uses the Lucene engine to index and process
requests for data retrieval.

Start here first : *
http://lucene.apache.org/solr/features.html#Solr+Uses+the+Lucene+Search+Library+and+Extends+it
!*

It would be unfair to compare to the Apache webserver to a cgi scripting
interface.

The apache webserver is just the container through with the webrowser
interacts with the CGI scripts.

This is very similar to how Solr is related to Lucene.

On Wed, Sep 16, 2009 at 9:26 AM, balaji.a reachbalaj...@gmail.com wrote:


 Hi All,
   I am aware that Solr internally uses Lucene for search and indexing. But
 it would be helpful if anybody explains about Solr features that is not
 provided by Lucene.

 Thanks,
 Balaji.
 --
 View this message in context:
 http://www.nabble.com/When-to-use-Solr-over-Lucene-tp25472354p25472354.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


  1   2   >