Overall

2008-06-09 Thread Mihails Agafonovs
Hi!

Some questions:
1) Is it possible to make Solr to use, for example, MySQL database,
or it only supports *.xml files as a database?
2) Is there a way to add data in the search database using some
online interface, or the only way is manually adding the data in the
*.xml files?
3) Is there any guide on how to implement Solr to the web-site?
 Ar cieņu, Mihails

Re: Overall

2008-06-09 Thread Umar Shah
2008/6/9 Mihails Agafonovs [EMAIL PROTECTED]:

 Hi!

 Some questions:
 1) Is it possible to make Solr to use, for example, MySQL database,
 or it only supports *.xml files as a database?

you can use DataImportHandler to  index from MySql (or other databases)


 2) Is there a way to add data in the search database using some
 online interface, or the only way is manually adding the data in the
 *.xml files?


you can generate the XMLs from a program  that can read from the data source
or
 use some of the solr clients  (java, python, ruby) to update  the index
using the provided APIs.


 3) Is there any guide on how to implement Solr to the web-site?


whatever you have is in the wiki and the mailing archives, if you cant find
it there, I am afraid it is not  available.


  Ar cieņu, Mihails


Re: Overall

2008-06-09 Thread Mihails Agafonovs
1) ok
2) This means developing some custom program, so there is no such
functionality in Solr :(
3) I have some connection problems and I really can't load these
mailing list archives at all! Anyway, I want to understand, how can I
use Solr in my site or any other usage?
 Quoting Umar Shah : 2008/6/9 Mihails Agafonovs :
  Hi!
 
  Some questions:
  1) Is it possible to make Solr to use, for example, MySQL
database,
  or it only supports *.xml files as a database?
 you can use DataImportHandler to  index from MySql (or other
databases)
 
  2) Is there a way to add data in the search database using some
  online interface, or the only way is manually adding the data in
the
  *.xml files?
 you can generate the XMLs from a program  that can read from the
data source
 or
 use some of the solr clients  (java, python, ruby) to update  the
index
 using the provided APIs.
 
  3) Is there any guide on how to implement Solr to the web-site?
 whatever you have is in the wiki and the mailing archives, if you
cant find
 it there, I am afraid it is not  available.
 
   Ar cieņu, Mihails
 Ar cieņu, Mihails

Links:
--
[1] mailto:[EMAIL PROTECTED]


Re: Overall

2008-06-09 Thread Dom Stockdale

Hi Mihails,

I don't know about points 1 and 2 as I'm just starting with Solr but  
for point 3 you need to understand that Solr is just going to return  
xml for your queries so you can use any web language to parse the xml  
of the results. It might return other formats like json as  well,  
haven't figured this out yet but it's not intended to give you a full  
blown web page that's up to you to do it'll just give you the data.


- d

On 9 Jun 2008, at 11:06, Mihails Agafonovs wrote:


1) ok
2) This means developing some custom program, so there is no such
functionality in Solr :(
3) I have some connection problems and I really can't load these
mailing list archives at all! Anyway, I want to understand, how can I
use Solr in my site or any other usage?
Quoting Umar Shah : 2008/6/9 Mihails Agafonovs :

Hi!

Some questions:
1) Is it possible to make Solr to use, for example, MySQL

database,

or it only supports *.xml files as a database?

you can use DataImportHandler to  index from MySql (or other
databases)


2) Is there a way to add data in the search database using some
online interface, or the only way is manually adding the data in

the

*.xml files?

you can generate the XMLs from a program  that can read from the
data source
or
use some of the solr clients  (java, python, ruby) to update  the
index
using the provided APIs.


3) Is there any guide on how to implement Solr to the web-site?

whatever you have is in the wiki and the mailing archives, if you
cant find
it there, I am afraid it is not  available.


Ar cieņu, Mihails

Ar cieņu, Mihails

Links:
--
[1] mailto:[EMAIL PROTECTED]


--
Dominic Stockdale
[EMAIL PROTECTED]
+44(0)1273 311407
+44(0)7886 654562
skype: domonline






setAllowLeadingWildcard

2008-06-09 Thread Dom Stockdale

Hello list,

I really need to setAllowLeadingWildcard to true and I'm wondering if  
you can advise me on the best way to do this. I am a newbie so forgive  
me if I'm being a dummy.


I've established that it's not set-able in the 1.2.0 version which  
seems to be quite old so I've  been looking through trunk to see  
what's what there and it seems now to be an option however the current  
version in trunk appears to be broken.


Doesn't anyone know a revision number from svn that might be working  
and where setAllowLeadingWildcard is set-able?


Is there another way I can set setAllowLeadingWildcard to true if I'm  
trying to do this the wrong way?


Thanks

- Dom


DataImport

2008-06-09 Thread Mihails Agafonovs
Looked through the tutorial on data import, section Full Import
Example.
1) Where is this dataimport.jar? There is no such file in the
extracted example-solr-home.jar.
2) Use the solr folder inside example-data-config folder as your
solr home. What does this mean? Anyway, there is no folder
example-data-config.
 Ar cieņu, Mihails

Re: DataImport

2008-06-09 Thread Shalin Shekhar Mangar
1. Correct, there is no jar. You can use the solr.war file. If you really
need a jar, you'll need to use the SOLR-469.patch at
http://issues.apache.org/jira/browse/SOLR-469 and build solr from source
after applying that patch.
2. The jar contains a folder named example-solr-home. Please check again.

Please let me know if you run into any problems.

2008/6/9 Mihails Agafonovs [EMAIL PROTECTED]:

 Looked through the tutorial on data import, section Full Import
 Example.
 1) Where is this dataimport.jar? There is no such file in the
 extracted example-solr-home.jar.
 2) Use the solr folder inside example-data-config folder as your
 solr home. What does this mean? Anyway, there is no folder
 example-data-config.
  Ar cieņu, Mihails




-- 
Regards,
Shalin Shekhar Mangar.


Solr system and numbers

2008-06-09 Thread dudes dudes

Hello experts, 

How does Solr deal with numbers or phone numbers .. For example if you have 
1234 and 12 34 or 1 234... with spaces between the numbers ..
Or this is dealt by lucene ?

any documentations or tutorial on this ?

many thanks, 
ak 
_

All new Live Search at Live.com

http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/

Re: DataImport

2008-06-09 Thread Mihails Agafonovs
I've placed the solr.war under the tomcat directory, restarted tomcat
to deploy the solr.war. But still... there is no .jar, no folder named
example-data-config, and hitting 
http://localhost:8983/solr/dataimport doesn't work. 
Do I need the original Solr instance to use this .war with?
 Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. You can
use the solr.war file. If you really
 need a jar, you'll need to use the SOLR-469.patch at
 http://issues.apache.org/jira/browse/SOLR-469 and build solr from
source
 after applying that patch.
 2. The jar contains a folder named example-solr-home. Please check
again.
 Please let me know if you run into any problems.
 2008/6/9 Mihails Agafonovs :
  Looked through the tutorial on data import, section Full Import
  Example.
  1) Where is this dataimport.jar? There is no such file in the
  extracted example-solr-home.jar.
  2) Use the solr folder inside example-data-config folder as your
  solr home. What does this mean? Anyway, there is no folder
  example-data-config.
   Ar cieņu, Mihails
 -- 
 Regards,
 Shalin Shekhar Mangar.
 Ar cieņu, Mihails

Links:
--
[1] mailto:[EMAIL PROTECTED]


RE: Solr system and numbers

2008-06-09 Thread dudes dudes

great info ,,, thanks a lot all 



 Date: Mon, 9 Jun 2008 05:58:50 -0700
 From: [EMAIL PROTECTED]
 Subject: Re: Solr system and numbers
 To: solr-user@lucene.apache.org
 
 Hi,
 Solr/Lucene can treat phone numbers as strings.  If you want to clean them up 
 and normalize them outside of Solr, you can do that and feed them into Solr 
 as pure numbers.
 
 How the phone numbers will be treated after you pump them into Solr depends 
 on the analyzer you choose to use for this data.  If you don't need to search 
 on subsets of phone numbers, then just don't tokenize them (i.e. use string 
 type if the phone numbers contain any non-numeric characters, sint otherwise).
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 - Original Message 
 From: dudes dudes 
 To: solr-user@lucene.apache.org
 Sent: Monday, June 9, 2008 2:10:20 PM
 Subject: Solr system and numbers
 
 
 Hello experts, 
 
 How does Solr deal with numbers or phone numbers .. For example if you have 
 1234 
 and 12 34 or 1 234... with spaces between the numbers ..
 Or this is dealt by lucene ?
 
 any documentations or tutorial on this ?
 
 many thanks, 
 ak 
 _
 
 All new Live Search at Live.com
 
 http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/
 

_

All new Live Search at Live.com

http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/

Re: Problems in solrJ trunk

2008-06-09 Thread Alexander Ramos Jardim
Well,

There is a simple case here. I tried to update SolrJ to use the last one and
got the application selected for test broke. So, I developed an alternative
interface for SolrServer and a wrapper to CommonsHttpSolrServer. Altered my
aoolication to use it and everything is working nice.

When you use good pratices like IoC or AOP it is preferred to program
interface oriented.

Sorry, but I find it really bad to go from an interface to a an abstract
class.

In my modest opinion, SolrServer should have both. Like SolrServer being an
interface and AbstractSolrServer implementing it.  CommonsHttpSolrServer
would extend AbstractSolrServer.

If the dev team really thinks interfaces are an ass, I think we will have
problems using Solr with other advanced OO features.

2008/6/7 Ryan McKinley [EMAIL PROTECTED]:

 solrj was not released in 1.2, so the change is not incompatible...

 The rationalle for abstract class vs interface is more to do with usage and
 future maintenance.  If SolrServer is an interface and solr 1.4 adds
 methods, there is no way to make it backwards compatible -- as an abstract
 class, we can add a reasonable default behavior.

 Since SolrServer is a rather involved action, it seems like it will tend to
 be a standalone class.  Interfaces are great for OO clarity, but very
 difficult to maintain.

 Is there a good usage case we are not thinking of before this gets baked
 into 1.3?

 ryan



 On Jun 7, 2008, at 2:13 PM, Alexander Ramos Jardim wrote:

 Hello,

 Shouldn't SolrServer be an interface that externalizes the signatures for
 classes like CommonsHttpSolrServer, like it was in solr-1.2? Why did it
 became an abstract class?
 I can't see any benefit from it, as now I need to type the object as
 CommonsHttpSolrServer directly.
 I think it is really bad.
 --
 Alexander Ramos Jardim





-- 
Alexander Ramos Jardim


Re: DataImport

2008-06-09 Thread Shalin Shekhar Mangar
No, the steps are as follows:
1. Download the example-solr-home.jar from the DataImportHandler wiki page
2. Extract it. You'll find a folder named example-solr-home and a solr.war
file after extraction
3. Copy the solr.war to tomcat_home/webapps. You don't need any other solr
instance. This war is self-sufficient.
4. You need to set the example-solr-home/solr folder as the solr home
folder. For instructions on how to do that, look at
http://wiki.apache.org/solr/SolrTomcat

From the port number of the URL you are trying, it seems that you're using
the Jetty supplied with Solr instead of Tomcat.

2008/6/9 Mihails Agafonovs [EMAIL PROTECTED]:

 I've placed the solr.war under the tomcat directory, restarted tomcat
 to deploy the solr.war. But still... there is no .jar, no folder named
 example-data-config, and hitting
 http://localhost:8983/solr/dataimport doesn't work.
 Do I need the original Solr instance to use this .war with?
  Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. You can
 use the solr.war file. If you really
  need a jar, you'll need to use the SOLR-469.patch at
  http://issues.apache.org/jira/browse/SOLR-469 and build solr from
 source
  after applying that patch.
  2. The jar contains a folder named example-solr-home. Please check
 again.
  Please let me know if you run into any problems.
  2008/6/9 Mihails Agafonovs :
   Looked through the tutorial on data import, section Full Import
   Example.
   1) Where is this dataimport.jar? There is no such file in the
   extracted example-solr-home.jar.
   2) Use the solr folder inside example-data-config folder as your
   solr home. What does this mean? Anyway, there is no folder
   example-data-config.
Ar cieņu, Mihails
  --
  Regards,
  Shalin Shekhar Mangar.
  Ar cieņu, Mihails

 Links:
 --
 [1] mailto:[EMAIL PROTECTED]




-- 
Regards,
Shalin Shekhar Mangar.


Re: Problems in solrJ trunk

2008-06-09 Thread Otis Gospodnetic
Hi,

This interface vs. abstract class and maintenance/backwards compatibility 
question comes up pretty often.  I suggest using markmail.org and searching for 
things like:
interface abstract solr -jira
interface abstract lucene -jira

I think that will lead to some explanations without anyone having to go into 
this discussion again.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Alexander Ramos Jardim [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, June 9, 2008 3:19:36 PM
 Subject: Re: Problems in solrJ trunk
 
 Well,
 
 There is a simple case here. I tried to update SolrJ to use the last one and
 got the application selected for test broke. So, I developed an alternative
 interface for SolrServer and a wrapper to CommonsHttpSolrServer. Altered my
 aoolication to use it and everything is working nice.
 
 When you use good pratices like IoC or AOP it is preferred to program
 interface oriented.
 
 Sorry, but I find it really bad to go from an interface to a an abstract
 class.
 
 In my modest opinion, SolrServer should have both. Like SolrServer being an
 interface and AbstractSolrServer implementing it.  CommonsHttpSolrServer
 would extend AbstractSolrServer.
 
 If the dev team really thinks interfaces are an ass, I think we will have
 problems using Solr with other advanced OO features.
 
 2008/6/7 Ryan McKinley :
 
  solrj was not released in 1.2, so the change is not incompatible...
 
  The rationalle for abstract class vs interface is more to do with usage and
  future maintenance.  If SolrServer is an interface and solr 1.4 adds
  methods, there is no way to make it backwards compatible -- as an abstract
  class, we can add a reasonable default behavior.
 
  Since SolrServer is a rather involved action, it seems like it will tend to
  be a standalone class.  Interfaces are great for OO clarity, but very
  difficult to maintain.
 
  Is there a good usage case we are not thinking of before this gets baked
  into 1.3?
 
  ryan
 
 
 
  On Jun 7, 2008, at 2:13 PM, Alexander Ramos Jardim wrote:
 
  Hello,
 
  Shouldn't SolrServer be an interface that externalizes the signatures for
  classes like CommonsHttpSolrServer, like it was in solr-1.2? Why did it
  became an abstract class?
  I can't see any benefit from it, as now I need to type the object as
  CommonsHttpSolrServer directly.
  I think it is really bad.
  --
  Alexander Ramos Jardim
 
 
 
 
 
 -- 
 Alexander Ramos Jardim



Re: Problems in solrJ trunk

2008-06-09 Thread Lucas F. A. Teixeira

Exactly,

And adding the methods in the abstract class in the minor releases, and 
in the interface in major releases.


[]s,

Lucas

Lucas Frare A. Teixeira
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
Tel: +55 11 3660.1622 - R3018



Alexander Ramos Jardim escreveu:

Well,

There is a simple case here. I tried to update SolrJ to use the last one and
got the application selected for test broke. So, I developed an alternative
interface for SolrServer and a wrapper to CommonsHttpSolrServer. Altered my
aoolication to use it and everything is working nice.

When you use good pratices like IoC or AOP it is preferred to program
interface oriented.

Sorry, but I find it really bad to go from an interface to a an abstract
class.

In my modest opinion, SolrServer should have both. Like SolrServer being an
interface and AbstractSolrServer implementing it.  CommonsHttpSolrServer
would extend AbstractSolrServer.

If the dev team really thinks interfaces are an ass, I think we will have
problems using Solr with other advanced OO features.

2008/6/7 Ryan McKinley [EMAIL PROTECTED]:

  

solrj was not released in 1.2, so the change is not incompatible...

The rationalle for abstract class vs interface is more to do with usage and
future maintenance.  If SolrServer is an interface and solr 1.4 adds
methods, there is no way to make it backwards compatible -- as an abstract
class, we can add a reasonable default behavior.

Since SolrServer is a rather involved action, it seems like it will tend to
be a standalone class.  Interfaces are great for OO clarity, but very
difficult to maintain.

Is there a good usage case we are not thinking of before this gets baked
into 1.3?

ryan



On Jun 7, 2008, at 2:13 PM, Alexander Ramos Jardim wrote:



Hello,

Shouldn't SolrServer be an interface that externalizes the signatures for
classes like CommonsHttpSolrServer, like it was in solr-1.2? Why did it
became an abstract class?
I can't see any benefit from it, as now I need to type the object as
CommonsHttpSolrServer directly.
I think it is really bad.
--
Alexander Ramos Jardim

  




  


Re: Problems in solrJ trunk

2008-06-09 Thread Alexander Ramos Jardim
Thank you Lucas,

You caught my point nicely and even got a clearer idea of what to do.
Sorry Solr Dev Team, but I don't there is any reasonable excuse for making
such an argument interface vs abstract class as they are complements and
don't have the same role in OOP.

Anyways, Solr is a great app. I just don't think it has the better
programming practices, but that is just me.

Let's not turn that in a flame, or big discussion.

2008/6/9 Lucas F. A. Teixeira [EMAIL PROTECTED]:

 Exactly,

 And adding the methods in the abstract class in the minor releases, and in
 the interface in major releases.

 []s,

 Lucas

 Lucas Frare A. Teixeira
 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 Tel: +55 11 3660.1622 - R3018



 Alexander Ramos Jardim escreveu:

  Well,

 There is a simple case here. I tried to update SolrJ to use the last one
 and
 got the application selected for test broke. So, I developed an
 alternative
 interface for SolrServer and a wrapper to CommonsHttpSolrServer. Altered
 my
 aoolication to use it and everything is working nice.

 When you use good pratices like IoC or AOP it is preferred to program
 interface oriented.

 Sorry, but I find it really bad to go from an interface to a an abstract
 class.

 In my modest opinion, SolrServer should have both. Like SolrServer being
 an
 interface and AbstractSolrServer implementing it.  CommonsHttpSolrServer
 would extend AbstractSolrServer.

 If the dev team really thinks interfaces are an ass, I think we will have
 problems using Solr with other advanced OO features.

 2008/6/7 Ryan McKinley [EMAIL PROTECTED]:



 solrj was not released in 1.2, so the change is not incompatible...

 The rationalle for abstract class vs interface is more to do with usage
 and
 future maintenance.  If SolrServer is an interface and solr 1.4 adds
 methods, there is no way to make it backwards compatible -- as an
 abstract
 class, we can add a reasonable default behavior.

 Since SolrServer is a rather involved action, it seems like it will tend
 to
 be a standalone class.  Interfaces are great for OO clarity, but very
 difficult to maintain.

 Is there a good usage case we are not thinking of before this gets baked
 into 1.3?

 ryan



 On Jun 7, 2008, at 2:13 PM, Alexander Ramos Jardim wrote:



 Hello,

 Shouldn't SolrServer be an interface that externalizes the signatures
 for
 classes like CommonsHttpSolrServer, like it was in solr-1.2? Why did it
 became an abstract class?
 I can't see any benefit from it, as now I need to type the object as
 CommonsHttpSolrServer directly.
 I think it is really bad.
 --
 Alexander Ramos Jardim












-- 
Alexander Ramos Jardim


solrj client in mven repository?

2008-06-09 Thread Zsolt Czinkos
Hello all

I'm new to solr, and  have a question about the java client. Is it
going to be available from central maven repository? I had a look, and
saw that it is under development (1.3 dev), but someone may have tha
answer.

I built the trunk and solrj code seems to be separated from solr server's code.


Best regards

Zsolt


Re: NullPointerException at lucene.analysis.StopFilter with 1.3

2008-06-09 Thread Ronald K. Braun
 : I'm just looking into transitioning from solr 1.2 to 1.3 (trunk).  I
 : have some legacy handler code (called AdvancedRequestHandler) that
 : used to work with 1.2 but now throws an exception using 1.3 (latest
 : nightly build).

 This is an interesting use case that wasn't really considered when we
 switched away from using hte SolrCore singlton ...
 When I have some more time, i'll spin up a thread on solr-dev to discuss
 what we should do about this -- n the mean time feel free to file a bug
 that StopFilter isn't backwards compatible.

Created SOLR-594 for this issue.

 FWIW: constructing a new TokenizerChain inside your RequestHandlers
 handeRequest method seems  unneccessary.   if nothing else, you could
 do this in your init method and reuse the TokenizerChain on every request.
 but if it were me, I'd just use the schema.xml to declare a fieldtype that
 had the behavior i want, and then use
 schema.getFieldType(specialType).getQueryAnalyzer().tokenStream(...)

I actually had a single reusable version, but flattened it back out in
the code snippet for clarity.  But thanks for the tactful suggestion.
:-)  I didn't know that you could fetch the tokenizer chain directly
from the schema (how cool), which was what was originally desired --
the constructed tokenizer was just mirroring an existing field.  I
appreciate the tip, Hoss -- much cleaner!

r


XSL scripting

2008-06-09 Thread Lance Norskog
 This started out in the num-docs thread, but deserves its own. And a wiki
page.

There is a more complex and general way to get the number of documents in
the index. I run a query against solr and postprocess the output with an XSL
script.

Install this xsl script as home/conf/xslt/numfound.xsl.

xsl:stylesheet version=1.0
xmlns:xsl=http://www.w3.org/1999/XSL/Transform;
xsl:output method=text/
xsl:template match=/
xsl:value-of select=response/result/@numFound /
xsl:text#x0A;/xsl:text
/xsl:template
/xsl:stylesheet

Make sure 'curl' is installed, and add numfound.sh, a unix shell script.

SHARD=localhost:8080/solr
QUERY=$1

LINK=http://$SHARD/select?indent=onversion=2.2q=$QUERYstart=0rows=0
fl=*wt=xslttr=numfound.xsl
curl --silent $LINK -H Content-Type:text -X GET

Run it as 
sh numfound.sh *:*
 
How to install the XSLT script is to be found on the Wiki.
Star-colon-star is magic for 'all records'.
 

XSL is appalling garbage.

Cheers!
 



Re: solrj client in mven repository?

2008-06-09 Thread spencer.c

It is not in a central repo yet, though this has been requested.  See the
issue I filed here:
https://issues.apache.org/jira/browse/SOLR-586

If you follow the outline there, you can build/install into your own repo
pretty easily.



Zsolt Czinkos-2 wrote:
 
 Hello all
 
 I'm new to solr, and  have a question about the java client. Is it
 going to be available from central maven repository? I had a look, and
 saw that it is under development (1.3 dev), but someone may have tha
 answer.
 
 I built the trunk and solrj code seems to be separated from solr server's
 code.
 
 
 Best regards
 
 Zsolt
 
 

-- 
View this message in context: 
http://www.nabble.com/solrj-client-in-mven-repository--tp17734823p17739891.html
Sent from the Solr - User mailing list archive at Nabble.com.



html to text based on some sort of uniqueness metric

2008-06-09 Thread Cam Bazz
Hello,

I am indexing newspaper articles as an excercise in solr. When dealing with
newspaper articles in previous experiences I always tried to get the div or
the table that contains the actual news, using nekohtml traversing tru the
dom tree and getting the text from the div or table that contains the
article. When dealing with many newspapers, it is a hassle to custom code to
extract relevant information. There is usually a lot of garbage in the html.
From categories to ads, and further more they change, so a static coding is
problematic.

I have been thinking if I could measure the frequency or uniqueness for each
node, and find the news automatically - but I have not come up with an
implementation.

Has anyone did/contemplated/used something similar? Maybe there is already a
way - using lucene, or even hadoop.

Best Regards,
-C.A.


Re: solrj client in mven repository?

2008-06-09 Thread Alexander Ramos Jardim
I have done mine already. It is really simple.

2008/6/9 spencer.c [EMAIL PROTECTED]:


 It is not in a central repo yet, though this has been requested.  See the
 issue I filed here:
 https://issues.apache.org/jira/browse/SOLR-586

 If you follow the outline there, you can build/install into your own repo
 pretty easily.



 Zsolt Czinkos-2 wrote:
 
  Hello all
 
  I'm new to solr, and  have a question about the java client. Is it
  going to be available from central maven repository? I had a look, and
  saw that it is under development (1.3 dev), but someone may have tha
  answer.
 
  I built the trunk and solrj code seems to be separated from solr server's
  code.
 
 
  Best regards
 
  Zsolt
 
 

 --
 View this message in context:
 http://www.nabble.com/solrj-client-in-mven-repository--tp17734823p17739891.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Alexander Ramos Jardim


Re: Solr system and numbers

2008-06-09 Thread Cam Bazz
I got a similar question:
how would one normalize or even detect if a string is a phone number?

On Mon, Jun 9, 2008 at 4:17 PM, dudes dudes [EMAIL PROTECTED] wrote:


 great info ,,, thanks a lot all


 
  Date: Mon, 9 Jun 2008 05:58:50 -0700
  From: [EMAIL PROTECTED]
  Subject: Re: Solr system and numbers
  To: solr-user@lucene.apache.org
 
  Hi,
  Solr/Lucene can treat phone numbers as strings.  If you want to clean
 them up and normalize them outside of Solr, you can do that and feed them
 into Solr as pure numbers.
 
  How the phone numbers will be treated after you pump them into Solr
 depends on the analyzer you choose to use for this data.  If you don't need
 to search on subsets of phone numbers, then just don't tokenize them (i.e.
 use string type if the phone numbers contain any non-numeric characters,
 sint otherwise).
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
  - Original Message 
  From: dudes dudes
  To: solr-user@lucene.apache.org
  Sent: Monday, June 9, 2008 2:10:20 PM
  Subject: Solr system and numbers
 
 
  Hello experts,
 
  How does Solr deal with numbers or phone numbers .. For example if you
 have 1234
  and 12 34 or 1 234... with spaces between the numbers ..
  Or this is dealt by lucene ?
 
  any documentations or tutorial on this ?
 
  many thanks,
  ak
  _
 
  All new Live Search at Live.com
 
  http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/
 

 _

 All new Live Search at Live.com

 http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/



Re: XSL scripting

2008-06-09 Thread Otis Gospodnetic
Lance,

Thanks, want to put it up on the Wiki?


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Lance Norskog [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, June 9, 2008 1:12:35 PM
 Subject: XSL scripting
 
 This started out in the num-docs thread, but deserves its own. And a wiki
 page.
 
 There is a more complex and general way to get the number of documents in
 the index. I run a query against solr and postprocess the output with an XSL
 script.
 
 Install this xsl script as home/conf/xslt/numfound.xsl.
 
 
 xmlns:xsl=http://www.w3.org/1999/XSL/Transform;
 
 
 
 
 
 
 
 Make sure 'curl' is installed, and add numfound.sh, a unix shell script.
 
 SHARD=localhost:8080/solr
 QUERY=$1
 
 LINK=http://$SHARD/select?indent=onversion=2.2q=$QUERYstart=0rows=0
 fl=*wt=xslttr=numfound.xsl
 curl --silent $LINK -H Content-Type:text -X GET
 
 Run it as 
 sh numfound.sh *:*
 
 How to install the XSLT script is to be found on the Wiki.
 Star-colon-star is magic for 'all records'.
 
 
 XSL is appalling garbage.
 
 Cheers!



Re: Solr system and numbers

2008-06-09 Thread Otis Gospodnetic
Not sure.  Perhaps it can be done by training a language model and treating 
phone numbers as named entities?  Not sure if it would work.  But I know there 
are a few NLP people subscribed, maybe they'll have some good ideas.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Cam Bazz [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, June 9, 2008 4:24:48 PM
 Subject: Re: Solr system and numbers
 
 I got a similar question:
 how would one normalize or even detect if a string is a phone number?
 
 On Mon, Jun 9, 2008 at 4:17 PM, dudes dudes wrote:
 
 
  great info ,,, thanks a lot all
 
 
  
   Date: Mon, 9 Jun 2008 05:58:50 -0700
   From: [EMAIL PROTECTED]
   Subject: Re: Solr system and numbers
   To: solr-user@lucene.apache.org
  
   Hi,
   Solr/Lucene can treat phone numbers as strings.  If you want to clean
  them up and normalize them outside of Solr, you can do that and feed them
  into Solr as pure numbers.
  
   How the phone numbers will be treated after you pump them into Solr
  depends on the analyzer you choose to use for this data.  If you don't need
  to search on subsets of phone numbers, then just don't tokenize them (i.e.
  use string type if the phone numbers contain any non-numeric characters,
  sint otherwise).
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
   - Original Message 
   From: dudes dudes
   To: solr-user@lucene.apache.org
   Sent: Monday, June 9, 2008 2:10:20 PM
   Subject: Solr system and numbers
  
  
   Hello experts,
  
   How does Solr deal with numbers or phone numbers .. For example if you
  have 1234
   and 12 34 or 1 234... with spaces between the numbers ..
   Or this is dealt by lucene ?
  
   any documentations or tutorial on this ?
  
   many thanks,
   ak
   _
  
   All new Live Search at Live.com
  
   http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/
  
 
  _
 
  All new Live Search at Live.com
 
  http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/
 



Re: html to text based on some sort of uniqueness metric

2008-06-09 Thread Otis Gospodnetic
I have not looked at the code yet, but look for NovelAnalyzer in Lucene JIRA. 
 I believe it's supposed to do something similar.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Cam Bazz [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, June 9, 2008 3:55:16 PM
 Subject: html to text based on some sort of uniqueness metric
 
 Hello,
 
 I am indexing newspaper articles as an excercise in solr. When dealing with
 newspaper articles in previous experiences I always tried to get the div or
 the table that contains the actual news, using nekohtml traversing tru the
 dom tree and getting the text from the div or table that contains the
 article. When dealing with many newspapers, it is a hassle to custom code to
 extract relevant information. There is usually a lot of garbage in the html.
 From categories to ads, and further more they change, so a static coding is
 problematic.
 
 I have been thinking if I could measure the frequency or uniqueness for each
 node, and find the news automatically - but I have not come up with an
 implementation.
 
 Has anyone did/contemplated/used something similar? Maybe there is already a
 way - using lucene, or even hadoop.
 
 Best Regards,
 -C.A.



Re: Solr system and numbers

2008-06-09 Thread Otis Gospodnetic
Doh, I forgot.  Regular expressions worked well for me when I dealt with that 
problem many years ago.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Otis Gospodnetic [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, June 9, 2008 5:36:34 PM
 Subject: Re: Solr system and numbers
 
 Not sure.  Perhaps it can be done by training a language model and treating 
 phone numbers as named entities?  Not sure if it would work.  But I know 
 there 
 are a few NLP people subscribed, maybe they'll have some good ideas.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 - Original Message 
  From: Cam Bazz 
  To: solr-user@lucene.apache.org
  Sent: Monday, June 9, 2008 4:24:48 PM
  Subject: Re: Solr system and numbers
  
  I got a similar question:
  how would one normalize or even detect if a string is a phone number?
  
  On Mon, Jun 9, 2008 at 4:17 PM, dudes dudes wrote:
  
  
   great info ,,, thanks a lot all
  
  
   
Date: Mon, 9 Jun 2008 05:58:50 -0700
From: [EMAIL PROTECTED]
Subject: Re: Solr system and numbers
To: solr-user@lucene.apache.org
   
Hi,
Solr/Lucene can treat phone numbers as strings.  If you want to clean
   them up and normalize them outside of Solr, you can do that and feed them
   into Solr as pure numbers.
   
How the phone numbers will be treated after you pump them into Solr
   depends on the analyzer you choose to use for this data.  If you don't 
   need
   to search on subsets of phone numbers, then just don't tokenize them (i.e.
   use string type if the phone numbers contain any non-numeric characters,
   sint otherwise).
   
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
   
   
- Original Message 
From: dudes dudes
To: solr-user@lucene.apache.org
Sent: Monday, June 9, 2008 2:10:20 PM
Subject: Solr system and numbers
   
   
Hello experts,
   
How does Solr deal with numbers or phone numbers .. For example if you
   have 1234
and 12 34 or 1 234... with spaces between the numbers ..
Or this is dealt by lucene ?
   
any documentations or tutorial on this ?
   
many thanks,
ak
_
   
All new Live Search at Live.com
   
http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/
   
  
   _
  
   All new Live Search at Live.com
  
   http://clk.atdmt.com/UKM/go/msnnkmgl001006ukm/direct/01/
  



Re: Overall

2008-06-09 Thread Alexander Ramos Jardim
2008/6/9 Mihails Agafonovs [EMAIL PROTECTED]:

 Hi!

 Some questions:
 1) Is it possible to make Solr to use, for example, MySQL database,
 or it only supports *.xml files as a database?

If you do that, use MySQL own full text search capabilities and not Solr, as
it is built from Lucene.


 2) Is there a way to add data in the search database using some
 online interface, or the only way is manually adding the data in the
 *.xml files?

You should develop your own .


 3) Is there any guide on how to implement Solr to the web-site?

Solr is easy to go. Choose your client api and begin toying with it. Good
things will come fast. :-)


  Ar cieņu, Mihails




-- 
Alexander Ramos Jardim


Re: solrj client in mven repository?

2008-06-09 Thread Zsolt Czinkos
I've already installed the jars into my local repo, but the pom files
are very useful.

Thank you

zsolt

On Mon, Jun 9, 2008 at 10:02 PM, Alexander Ramos Jardim
[EMAIL PROTECTED] wrote:
 I have done mine already. It is really simple.

 2008/6/9 spencer.c [EMAIL PROTECTED]:


 It is not in a central repo yet, though this has been requested.  See the
 issue I filed here:
 https://issues.apache.org/jira/browse/SOLR-586

 If you follow the outline there, you can build/install into your own repo
 pretty easily.



 Zsolt Czinkos-2 wrote:
 
  Hello all
 
  I'm new to solr, and  have a question about the java client. Is it
  going to be available from central maven repository? I had a look, and
  saw that it is under development (1.3 dev), but someone may have tha
  answer.
 
  I built the trunk and solrj code seems to be separated from solr server's
  code.
 
 
  Best regards
 
  Zsolt
 
 

 --
 View this message in context:
 http://www.nabble.com/solrj-client-in-mven-repository--tp17734823p17739891.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Alexander Ramos Jardim



Re: Num docs

2008-06-09 Thread Otis Gospodnetic
Exactly.  I think I mentioned this once before several months ago.  One can 
take various hardware specs (# cores, CPU speed, FSB, RAM, etc.), performance 
numbers, etc. and come up with a number for each server's overall capacity.

 
As a matter of fact, I think this would be useful to have right in Solr, 
primarily for use when allocating and sizing shards for Distributed Search.  
JIRA enhancement/feature issue?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Alexander Ramos Jardim [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, June 9, 2008 6:42:17 PM
 Subject: Re: Num docs
 
 I even think that such a decision should be based on the overall machine
 performance at a given time, and not the index size. Unless you are talking
 solely about HD space and not having any performance issues.
 
 2008/6/7 Otis Gospodnetic :
 
  Marcus,
 
 
  For that you can rely on du, vmstat, iostat, top and such, too. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
  - Original Message 
   From: Marcus Herou 
   To: solr-user@lucene.apache.org
   Sent: Saturday, June 7, 2008 12:33:10 PM
   Subject: Re: Num docs
  
   Thanks, I wanna ask the indices how much more each shard can handle
  before
   they're considered full and scream for a budget to get a new machine :)
  
   /M
  
   On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic
   wrote:
  
Marcus, check out the Luke request handler.  You can get it from its
output.  It may also be possible to get *just* that number, but I'm not
looking at docs/code right now to know for sure.
   
 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
   
   
- Original Message 
 From: Marcus Herou
 To: solr-user@lucene.apache.org
 Sent: Saturday, June 7, 2008 5:09:20 AM
 Subject: Num docs

 Hi.

 Is there a way of retrieve IndexWriter.numDocs() in SOLR ?

 Kindly

 //Marcus

 --
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 [EMAIL PROTECTED]
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/
   
   
  
  
   --
   Marcus Herou CTO and co-founder Tailsweep AB
   +46702561312
   [EMAIL PROTECTED]
   http://www.tailsweep.com/
   http://blogg.tailsweep.com/
 
 
 
 
 -- 
 Alexander Ramos Jardim



searching only within allowed documents

2008-06-09 Thread Stephen Weiss

Hi,

I'm new to Solr (and Lucene) and I'm trying to work out just how I  
could fit this technology into my app (I'm moving over from using  
MySQL fulltext indexes).  Things are actually going really well - the  
facet functionality fits in just perfectly, and the basic full-text  
searching is working very well for me as well, especially considering  
that I'm trying to index several languages at once.  It's really much,  
much faster than MySQL.  Somehow, I thought that would be the hard  
part!  Unfortunately, I'm getting tripped up on something that seems  
far more complicated...


So, there are two kinds of searches you can do in this application.   
There's an Advanced Search and a basic Text Search.  For the  
Advanced Search, users pick out one or more sets of documents which  
they are allowed to see, and some set of tags to filter by, and they  
get a list of documents.  This part is easy, I can do all of this with  
the functionality I picked up reading the docs and tutorials, and  
since my application is handling what sets of documents that my users  
can choose, Solr doesn't need to know anything about the permissions  
model.


The text search is where I'm running into trouble.  Right now, the  
application automatically filters the documents to search through with  
a join in MySQL.  In order to do this through Solr, I need to figure  
out a good way for Solr to know what sets of documents in which to  
search.


Here's what I have so far:

1)  Each document has a field folder_id, which contains one value,  
which is the ID of the folder to which the document belongs.  There  
are right now about 6000 different folders altogether.


2)  Each user is permitted to see documents from a particular subset  
of folders.  Some users can see only 100-200 folders, some users can  
see 4000-5000 folders (all depends on what they have subscribed to).


In the advanced search, in order to restrict the available documents,  
I use a filter query:  fq=folder_id:1 OR folder_id:2 etc...  In the  
advanced search, the user is only ever searching through a max of 80  
or 90 folders (and usually more like 1 or 2), so this seems quite  
workable.


However, in the plain text search, the user automatically searches  
through *all* of the folders to which they have subscribed.  This  
means, for (good!) users who have subscribed to a large (1000+) number  
of folders, the filter query would be quite long, and would exceed the  
default number of boolean parameters allowed.  Of course, I could just  
increase the limit, but the fact that a limit is there in the first  
place leads me to believe this is probably not the most scalable  
solution.


Now, I'm reading on this tutorial page for Lucene:  http://www.lucenetutorial.com/techniques/permission-filtering.html 
 that the best way to do this would involve some combination of  
HitCollector  FieldCache.  From what the author is saying, this  
sounds like exactly what I need.  Unfortunately, I am almost  
completely Java-illiterate, and on top of that, I'm  not really  
finding any explanation of:


a) What exactly I would do with the HitCollector  FieldCache objects  
that would help me achieve this goal - even just at the level of  
Lucene, there's no real explanation in the tutorial

or
b) Where exactly these classes fit in to Solr (if they do at all)


So far I have already written my own (tiny, tiny) Tokenizer and  
TokenizerFactory for correctly parsing the tags that come in from the  
database, and that works great, so I'm thinking, if there's something  
I can sub-class or modify somewhere to get this working, even with my  
meager Java knowledge I could do it...  But I have no clue even where  
to start with this.  Do I need my own custom version of  
SolrIndexSearcher, or SolrRequestHandler... or some other class I  
haven't even gotten to yet?


If it helps, I am using version 1.2, and trying to integrate this with  
a LAMP-based application.  I already have hooks set up to allow PHP to  
index documents, query solr, and parse responses.  Since everything  
else is already working so well, and it's just a matter of getting  
permissions working, I would really, really like to stick with Solr.   
Has anyone done anything like this or can point me in the right  
direction?  I can figure out the mechanics of getting the list of  
allowed folder_ids to Solr, all I really need to know is what kind of  
modifications I would need to make, where, to get Solr to limit the  
search to a particular subset of documents without using a gigantic  
filter query.


Many thanks for any advice.  My apologies if this has been asked a  
million times before, I am new to the list however I did read and  
search through the archives and didn't really find anything on this  
subject.


Best regards,
Steve


Re: Overall

2008-06-09 Thread Shalin Shekhar Mangar
2) Take a look at DataImportHandler for indexing data at
http://wiki.apache.org/solr/DataImportHandler

2008/6/10 Alexander Ramos Jardim [EMAIL PROTECTED]:

 2008/6/9 Mihails Agafonovs [EMAIL PROTECTED]:

  Hi!
 
  Some questions:
  1) Is it possible to make Solr to use, for example, MySQL database,
  or it only supports *.xml files as a database?

 If you do that, use MySQL own full text search capabilities and not Solr,
 as
 it is built from Lucene.

 
  2) Is there a way to add data in the search database using some
  online interface, or the only way is manually adding the data in the
  *.xml files?

 You should develop your own .

 
  3) Is there any guide on how to implement Solr to the web-site?

 Solr is easy to go. Choose your client api and begin toying with it. Good
 things will come fast. :-)

 
   Ar cieņu, Mihails




 --
 Alexander Ramos Jardim




-- 
Regards,
Shalin Shekhar Mangar.